Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments

3Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The main problems in sentiment analysis models on Indonesian YouTube comments are unstructured data and low classification accuracy. Sentiment analysis for Indonesian, which is different from English, requires proper preprocessing and classification methods. Previous research usually using Linear Support Vector Machine (SVM), Naïve Bayes and Decision Tree. Although the accuracy of SVM is better than other algorithms, it still needs to be improved. This study aims to compare the performance of the tree-based ensemble method and feature selection to improve the sentiment analysis model for Indonesian YouTube comments. This research crawled Indonesian YouTube comments from different domains and produce ten datasets. The preprocessing’s method in this research was removed stopword, convert slang words, and stemming. For feature selection, we tested two vectorizer method, i.e. Term Frequency (TF) or Term Frequency/Inverse Document Frequency (TF-IDF). The model build using six machine learning, consist of four tree-based ensemble machine learning to raise better accuracy, Linear SVM and Decision Tree. We use tree-based ensemble machine learning, they are Random Forest, and Extra Tree represents bagging ensemble. AdaBoost and Gradient Boosting represent boosting ensemble. SVM and Decision tree as a comparison. Based on experiments by combining feature selection and ensemble machine learning, it can be concluded that the type of vectorizer has little effect on classification accuracy. In all experiments, the best machine learning methods are Extra Tree with an accuracy of 93.39% and AdaBoost with an accuracy of 92.53%. Whereas, the use of TF or TF-IDF does not significantly affect accuracy.

Cite

CITATION STYLE

APA

Khomsah, S., Hidayatullah, A. F., & Aribowo, A. S. (2021). Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments. In Lecture Notes in Electrical Engineering (Vol. 746 LNEE, pp. 161–172). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-33-6926-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free