Abstract
We examine the efficacy of various feature–learner combinations for language identification in different types of text-based code-switched interactions – human-human dialog, human-machine dialog, as well as monolog – at both the token and turn levels. In order to examine the generalization of such methods across language pairs and datasets, we analyze ten different datasets of code-switched text. We extract a variety of character- and word-based text features and pass them into multiple learners, including conditional random fields, logistic regressors, and recurrent neural networks. We further examine the efficacy of character-level embedding and GloVe features in improving performance and observe that our best-performing text system significantly outperforms the majority vote baseline across language pairs and datasets.
Cite
CITATION STYLE
Ramanarayanan, V., & Pugh, R. (2018). Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora. In SIGDIAL 2018 - 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Proceedings of the Conference (pp. 80–88). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-5009
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.