Active Learning on Indonesian Twitter Sentiment Analysis Using Uncertainty Sampling

3Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

Nowadays, sentiment analysis research in social media is rapidly developing. Sentiment analysis typically falls under supervised learning, which requires annotating data. However, the annotation process for sentiment analysis tasks is notoriously time-consuming. An effective strategy to overcome this challenge, known as active learning, has emerged. Active learning involves labeling only a small subset of the dataset, leaving the rest for annotation through sampling strategies. This study focuses on comparing two active learning strategies: random sampling and boundary sampling. These strategies are applied to machine learning models such as logistic regression and random forests. In addition, we present an evaluation of the model performance and data savings achieved by implementing these strategies in the context of traditional machine learning for sentiment analysis on Twitter, and the dataset consists of two labels: positive and negative sentiments. The results of our investigation show that an uncertainty sampling strategy can significantly reduce the amount of training data required, saving up to 65% of the total training data required to achieve peak model accuracy. The best model obtained in this experiment is a random forest with a margin sampling strategy, yielding an accuracy of 81.12% and an F1 score of 88.60%. This research highlights the effectiveness of active learning strategies in sentiment analysis, demonstrating their potential to improve model performance and resource efficiency. The results underscore the viability of employing active learning methods, particularly the combination of random forest models with margin sampling, which can achieve more efficiency regarding data usage in social media sentiment analysis.

Cite

CITATION STYLE

APA

Liebenlito, M., Inayah, N., Choerunnisa, E., Sutanto, T. E., & Inna, S. (2024). Active Learning on Indonesian Twitter Sentiment Analysis Using Uncertainty Sampling. Journal of Applied Data Sciences, 5(1), 114–121. https://doi.org/10.47738/jads.v5i1.144

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free