Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization

Kusampudi Siva Subrahamanyam Varma; Preetham Sathineni; Radhika Mamidi

Conference ProceedingsOPEN ACCESS

Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization

International Conference Recent Advances in Natural Language Processing, RANLP (2021) 753-760

DOI: 10.26615/978-954-452-072-4_086

16Citations

57Readers

Abstract

In a multilingual society, people communicate in more than one language, leading to Code-Mixed data. Sentimental analysis on Code-Mixed Telugu-English Text (CMTET) poses unique challenges. The unstructured nature of the Code-Mixed Data is due to the informal language, informal transliterations, and spelling errors. In this paper, we introduce an annotated dataset for Sentiment Analysis in CMTET. Also, we report an accuracy of 80.22% on this dataset using novel unsupervised data normalization with a Multilayer Perceptron (MLP) model. This proposed data normalization technique can be extended to any NLP task involving CMTET. Further, we report an increase of 2.53% accuracy due to this data normalization approach in our best model.

Cite

CITATION STYLE

APA

Varma, K. S. S., Sathineni, P., & Mamidi, R. (2021). Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 753–760). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_086

Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization

Abstract

Cite

Register to see more suggestions