Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora

2Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

We examine the efficacy of various feature–learner combinations for language identification in different types of text-based code-switched interactions – human-human dialog, human-machine dialog, as well as monolog – at both the token and turn levels. In order to examine the generalization of such methods across language pairs and datasets, we analyze ten different datasets of code-switched text. We extract a variety of character- and word-based text features and pass them into multiple learners, including conditional random fields, logistic regressors, and recurrent neural networks. We further examine the efficacy of character-level embedding and GloVe features in improving performance and observe that our best-performing text system significantly outperforms the majority vote baseline across language pairs and datasets.

Cite

CITATION STYLE

APA

Ramanarayanan, V., & Pugh, R. (2018). Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora. In SIGDIAL 2018 - 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Proceedings of the Conference (pp. 80–88). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-5009

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free