Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach

C. Tho; Y. Heryadi; L. Lukas; A. Wibowo

Conference ProceedingsOPEN ACCESS

Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach

Journal of Physics: Conference Series (2021) 1869(1)

DOI: 10.1088/1742-6596/1869/1/012084

17Citations

58Readers

Abstract

Nowadays mixing one language with another language either in spoken or written communication has become a common practice for bilingual speakers in daily conversation as well as in social media. Lexicon based approach is one of the approaches in extracting the sentiment analysis. This study is aimed to compare two lexicon models which are SentiNetWord and VADER in extracting the polarity of the code-mixed sentences in Indonesian language and Javanese language. 3,963 tweets were gathered from two accounts that provide code-mixed tweets. Pre-processing such as removing duplicates, translating to English, filter special characters, transform lower case and filter stop words were conducted on the tweets. Positive and negative word score from lexicon model was then calculated using simple mathematic formula in order to classify the polarity. By comparing with the manual labelling, the result showed that SentiNetWord perform better than VADER in negative sentiments. However, both of the lexicon model did not perform well in neutral and positive sentiments. On overall performance, VADER showed better performance than SentiNetWord. This study showed that the reason for the misclassified was that most of Indonesian language and Javanese language consist of words that were considered as positive in both Lexicon model.

Cite

CITATION STYLE

APA

Tho, C., Heryadi, Y., Lukas, L., & Wibowo, A. (2021). Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach. In Journal of Physics: Conference Series (Vol. 1869). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1869/1/012084

Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach

Abstract

Cite

Register to see more suggestions