Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Rajat Singh; Nurendra Choudhary; Manish Shrivastava

Conference Proceedings

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13396 LNCS 371-381

DOI: 10.1007/978-3-031-23793-5_30

2Citations

48Readers

Get full text

Abstract

Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies. This trend induces portmanteau of South Asian languages with English. The blend of multiple languages as code-mixed data has recently become popular in research communities for various NLP tasks. Code-mixed data consist of anomalies such as grammatical errors and spelling variations. In this paper, we leverage the contextual property of words where the different spelling variation of words share similar context in a large noisy social media text. We capture different variations of words belonging to same context in an unsupervised manner using distributed representations of words. Our experiments reveal that preprocessing of the code-mixed dataset based on our approach improves the performance in state-of-the-art part-of-speech tagging (POS-tagging) and sentiment analysis tasks.

Author supplied keywords

Cite

CITATION STYLE

APA

Singh, R., Choudhary, N., & Shrivastava, M. (2023). Automatic Normalization of Word Variations in Code-Mixed Social Media Text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13396 LNCS, pp. 371–381). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-23793-5_30

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Abstract

Author supplied keywords

Cite

Register to see more suggestions