Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm

11Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the remarkable advances on the Internet and recent computer technologies, social media has become prominent platforms to share opinions. Nowadays, COVID-19 is considered as one of the major crises in the world. People use social media to express their thoughts about COVID-19 and actions that have been taken to control it. There is an immense need to discover and understand the public sentiment related to COVID-19 to give better insights for the decision makers and governments in making accurate decisions. In regard to this interest, several researches have explored COVID-19 outbreak sentiment analysis, however most of these studies have used classification approach which require the data to be manually labelled. In analyzing large number of data, labelling process can be an intricate task and expert dependent. This study aims to explore COVID-19 pandemic sentiment by using clustering approach. The data is obtained by crawling COVID-19 related posts from Twitter. The crawled data is pre-processed, and terms are extracted by using Term Frequency-Inverse Document Frequency (TF-IDF) technique. Singular Value Decomposition (SVD) technique is then used to reduce irrelevant features. K-means algorithm is employed to cluster the tweets into k clusters. The results of each cluster are plotted using t-Distributed Stochastic Neighbour Embedding (t-SNE) technique and lexicon-based sentiment analysis has been applied to discover sentiments of these clusters. The results showed relatively 9 clusters were obtained with different topics ranging highest score of 83.25% positivity and 16.75% of negativity are reported. Dominant topics are explored using word cloud and the clustering results have been evaluated with 0.0070 Silhouette coefficient. In future, this study suggests in using other word embedding technique as a data representation to deal with sparsity and high dimensionality of textual data.

Cite

CITATION STYLE

APA

Hussein, A., Ahmad, F. K., & Kamaruddin, S. S. (2021). Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm. Journal of System and Management Sciences, 11(4), 167–189. https://doi.org/10.33168/JSMS.2021.0409

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free