Opinion Classification on Code-mixed Tamil Language

S. Divya; N. Sripriya; Daphne Evangelin; G. Saai Sindhoora

Conference Proceedings

Opinion Classification on Code-mixed Tamil Language

Communications in Computer and Information Science (2023) 1802 CCIS 155-168

DOI: 10.1007/978-3-031-33231-9_10

2Citations

4Readers

Get full text

Abstract

User Sentiment Analysis (SA) is an interesting application of Natural Language Processing (NLP) to analyze the opinions of an individual. The user's opinion is beneficial to the public, business organizations, movie producers etc. to take valid decisions and enhance it. Few sentiments are incorrectly interpreted due to context errors such as multi-polarity. People belonging to multilingual communities utilize multiple regional languages for communication and thus social media platform enabled the users to express their ideas in mixed languages. The user opinions posted as a mixture of two or more language is known as code-mixed data. It is quiet challenging to handle such code-mixed data as it contains colloquial vocabulary and is difficult to interpret the context in mixed languages. This proposed system focuses on this issue by analyzing the efficiency several word embedding techniques in the generation of contextual representation of words. To evaluate the performance of various embedding techniques, the representations generated are given as input to a standard machine learning technique for sentiment classification. The efficiency of several embedding algorithm is analyzed by classifying the code-mixed data based on its representation. This analysis is carried out on Dravidian Code-mixed FIRE 2020 Tamil dataset which contains review comments collected from YouTube. The evaluation proves that the transformer model generates effective representations and the positive labels are efficiently identified with the F1 score of 0.75. The representations generated by various embedding algorithms are fed as input to several classification algorithms and the accuracy of the models are estimated. From the result, it is derived that IndicBERT generates semantically efficient representations and thus facilitates in achieving greater classification accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Divya, S., Sripriya, N., Evangelin, D., & Saai Sindhoora, G. (2023). Opinion Classification on Code-mixed Tamil Language. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 155–168). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_10

Opinion Classification on Code-mixed Tamil Language

Abstract

Author supplied keywords

Cite

Register to see more suggestions