A Systematic Review on Language Identification of Code-Mixed Text: Techniques, Data Availability, Challenges, and Framework Development

25Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The mix of native language with other languages (code-mixing) in social media has posed a severe challenge for language identification (LID) systems. It has encouraged research on code-mixed LID solutions. Four things have been identified in this study, such as techniques, challenges, and dataset availability with corresponding quality criteria and developed a comprehensive framework for code-mixed LID. Also, we identified gaps and future work opportunities in tackling code-mixed LID challenges. Based on our analysis of reviewed studies, we outlined key points for future research in code-mixed LID. We demonstrated a taxonomy of applied techniques for code-mixed LID and highlighted the different technique variants. In code-mixed LID tasks, we discovered four significant challenges: ambiguity, lexical borrowing, non-standard words, and intra-word code-mixing. This systematic literature review recognised 32 code-mixed datasets available for LID. We proposed five features to describe the quality criteria datasets, such as the number of instances or sentences, percentage of code-mixed types in the data, number of tokens, number of unique tokens, and average sentence length. Finally, we synthesised the methodologies and proposed a conceptual framework for subsequent studies through our literature analysis.

Cite

CITATION STYLE

APA

Hidayatullah, A. F., Qazi, A., Lai, D. T. C., & Apong, R. A. (2022). A Systematic Review on Language Identification of Code-Mixed Text: Techniques, Data Availability, Challenges, and Framework Development. IEEE Access, 10, 122812–122831. https://doi.org/10.1109/ACCESS.2022.3223703

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free