Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora

Vikram Ramanarayanan; Robert Pugh

Conference ProceedingsOPEN ACCESS

Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora

SIGDIAL 2018 - 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Proceedings of the Conference (2018) 80-88

DOI: 10.18653/v1/w18-5009

2Citations

75Readers

Abstract

We examine the efficacy of various feature–learner combinations for language identification in different types of text-based code-switched interactions – human-human dialog, human-machine dialog, as well as monolog – at both the token and turn levels. In order to examine the generalization of such methods across language pairs and datasets, we analyze ten different datasets of code-switched text. We extract a variety of character- and word-based text features and pass them into multiple learners, including conditional random fields, logistic regressors, and recurrent neural networks. We further examine the efficacy of character-level embedding and GloVe features in improving performance and observe that our best-performing text system significantly outperforms the majority vote baseline across language pairs and datasets.

Cite

CITATION STYLE

APA

Ramanarayanan, V., & Pugh, R. (2018). Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora. In SIGDIAL 2018 - 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Proceedings of the Conference (pp. 80–88). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-5009

Automatic token and turn level language identification for code-switched text dialog: An analysis across language pairs and corpora

Abstract

Cite

Register to see more suggestions