Abstract
Code-switching is a very common occurrence in social media communication, predominantly found in multilingual countries like India. Using more than one language in communication is known as codeswitching or code-mixing. Some of the important applications of code-switch are machine translation (MT), shallow parsing, dialog systems, and semantic parsing. Identifying code-switch and monolingual information is useful for better communication in online networking websites. In this paper, we performed a character level n-gram approach to identify monolingual and code-switch information from English-Kannada social media data. We paralleled various machine learning techniques such as naïve Bayes (NB), support vector classifier (SVC), logistic regression (LR) and neural network (NN) on English-Kannada code-switch (EKCS) data. From the proposed approach, it is observed that the character level n-gram approach provides 1.8% to 4.1% of improvement in terms of Accuracy and 1.6% to 3.8% of improvement in F1-score. Also observed that SVC and NN techniques are outperformed in terms of accuracy (97.9%) and F1-score (98%) with character level n-gram.
Author supplied keywords
Cite
CITATION STYLE
Chundi, R., Hulipalled, V. R., & Simha, J. B. (2023). Identification of monolingual and code-switch information from English-Kannada code-switch data. International Journal of Electrical and Computer Engineering, 13(5), 5632–5640. https://doi.org/10.11591/ijece.v13i5.pp5632-5640
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.