Nowadays mixing one language with another language either in spoken or written communication has become a common practice for bilingual speakers in daily conversation as well as in social media. Lexicon based approach is one of the approaches in extracting the sentiment analysis. This study is aimed to compare two lexicon models which are SentiNetWord and VADER in extracting the polarity of the code-mixed sentences in Indonesian language and Javanese language. 3,963 tweets were gathered from two accounts that provide code-mixed tweets. Pre-processing such as removing duplicates, translating to English, filter special characters, transform lower case and filter stop words were conducted on the tweets. Positive and negative word score from lexicon model was then calculated using simple mathematic formula in order to classify the polarity. By comparing with the manual labelling, the result showed that SentiNetWord perform better than VADER in negative sentiments. However, both of the lexicon model did not perform well in neutral and positive sentiments. On overall performance, VADER showed better performance than SentiNetWord. This study showed that the reason for the misclassified was that most of Indonesian language and Javanese language consist of words that were considered as positive in both Lexicon model.
CITATION STYLE
Tho, C., Heryadi, Y., Lukas, L., & Wibowo, A. (2021). Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach. In Journal of Physics: Conference Series (Vol. 1869). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1869/1/012084
Mendeley helps you to discover research relevant for your work.