Codeswitching language identification using Subword Information Enriched Word Vectors

13Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

Codeswitching is a widely observed phenomenon among bilingual speakers. By combining subword information enriched word vectors with linear-chain Conditional Random Field, we develop a supervised machine learning model that identifies languages in a English-Spanish codeswitched tweets. Our computational method achieves a tweet-level weighted F1 of 0.83 and a token-level accuracy of 0.949 without using any external resource. The result demonstrates that named entity recognition remains a challenge in codeswitched texts and warrants further work.

Cite

CITATION STYLE

APA

Xia, M. X. (2016). Codeswitching language identification using Subword Information Enriched Word Vectors. In EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (pp. 132–136). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5818

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free