Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm

Adnan Hussein; Farzana Kabir Ahmad; Siti Sakira Kamaruddin

Journal ArticleOPEN ACCESS

Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm

Journal of System and Management Sciences (2021) 11(4) 167-189

DOI: 10.33168/JSMS.2021.0409

11Citations

34Readers

Get full text

Abstract

With the remarkable advances on the Internet and recent computer technologies, social media has become prominent platforms to share opinions. Nowadays, COVID-19 is considered as one of the major crises in the world. People use social media to express their thoughts about COVID-19 and actions that have been taken to control it. There is an immense need to discover and understand the public sentiment related to COVID-19 to give better insights for the decision makers and governments in making accurate decisions. In regard to this interest, several researches have explored COVID-19 outbreak sentiment analysis, however most of these studies have used classification approach which require the data to be manually labelled. In analyzing large number of data, labelling process can be an intricate task and expert dependent. This study aims to explore COVID-19 pandemic sentiment by using clustering approach. The data is obtained by crawling COVID-19 related posts from Twitter. The crawled data is pre-processed, and terms are extracted by using Term Frequency-Inverse Document Frequency (TF-IDF) technique. Singular Value Decomposition (SVD) technique is then used to reduce irrelevant features. K-means algorithm is employed to cluster the tweets into k clusters. The results of each cluster are plotted using t-Distributed Stochastic Neighbour Embedding (t-SNE) technique and lexicon-based sentiment analysis has been applied to discover sentiments of these clusters. The results showed relatively 9 clusters were obtained with different topics ranging highest score of 83.25% positivity and 16.75% of negativity are reported. Dominant topics are explored using word cloud and the clustering results have been evaluated with 0.0070 Silhouette coefficient. In future, this study suggests in using other word embedding technique as a data representation to deal with sparsity and high dimensionality of textual data.

Author supplied keywords

Cite

CITATION STYLE

APA

Hussein, A., Ahmad, F. K., & Kamaruddin, S. S. (2021). Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm. Journal of System and Management Sciences, 11(4), 167–189. https://doi.org/10.33168/JSMS.2021.0409

Cluster Analysis on Covid-19 Outbreak Sentiments from Twitter Data using K-means Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions