Identification of monolingual and code-switch information from English-Kannada code-switch data

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Code-switching is a very common occurrence in social media communication, predominantly found in multilingual countries like India. Using more than one language in communication is known as codeswitching or code-mixing. Some of the important applications of code-switch are machine translation (MT), shallow parsing, dialog systems, and semantic parsing. Identifying code-switch and monolingual information is useful for better communication in online networking websites. In this paper, we performed a character level n-gram approach to identify monolingual and code-switch information from English-Kannada social media data. We paralleled various machine learning techniques such as naïve Bayes (NB), support vector classifier (SVC), logistic regression (LR) and neural network (NN) on English-Kannada code-switch (EKCS) data. From the proposed approach, it is observed that the character level n-gram approach provides 1.8% to 4.1% of improvement in terms of Accuracy and 1.6% to 3.8% of improvement in F1-score. Also observed that SVC and NN techniques are outperformed in terms of accuracy (97.9%) and F1-score (98%) with character level n-gram.

Cite

CITATION STYLE

APA

Chundi, R., Hulipalled, V. R., & Simha, J. B. (2023). Identification of monolingual and code-switch information from English-Kannada code-switch data. International Journal of Electrical and Computer Engineering, 13(5), 5632–5640. https://doi.org/10.11591/ijece.v13i5.pp5632-5640

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free