A new methodology for language identification in social media code-mixed text

Yogesh Gupta; Ghanshyam Raghuwanshi; Aprna Tripathi

Conference Proceedings

A new methodology for language identification in social media code-mixed text

Advances in Intelligent Systems and Computing (2021) 1141 243-254

DOI: 10.1007/978-981-15-3383-9_22

4Citations

7Readers

Get full text

Abstract

Nowadays, Transliteration is one of the hot research areas in the field of Natural Language Processing. Transliteration means that transferring a word from one language to another language and it is mostly used in cross-language platforms. Generally, people use code-mixed language for sharing their views on social media like Twitter, WhatsApp, etc. Code-mixed language means one language is written using another language script and it is very important to identify the languages used in each word to process such type of text. Therefore, a deep learning model is implemented using Bidirectional Long Short-Term Memory (BLSTM) for Indian social media texts in this paper. This model identifies the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The proposed model gives better accuracy for word-embedding model as compared to character embedding.

Author supplied keywords

Cite

CITATION STYLE

APA

Gupta, Y., Raghuwanshi, G., & Tripathi, A. (2021). A new methodology for language identification in social media code-mixed text. In Advances in Intelligent Systems and Computing (Vol. 1141, pp. 243–254). Springer. https://doi.org/10.1007/978-981-15-3383-9_22

A new methodology for language identification in social media code-mixed text

Abstract

Author supplied keywords

Cite

Register to see more suggestions