Codeswitching is a widely observed phenomenon among bilingual speakers. By combining subword information enriched word vectors with linear-chain Conditional Random Field, we develop a supervised machine learning model that identifies languages in a English-Spanish codeswitched tweets. Our computational method achieves a tweet-level weighted F1 of 0.83 and a token-level accuracy of 0.949 without using any external resource. The result demonstrates that named entity recognition remains a challenge in codeswitched texts and warrants further work.
CITATION STYLE
Xia, M. X. (2016). Codeswitching language identification using Subword Information Enriched Word Vectors. In EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (pp. 132–136). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5818
Mendeley helps you to discover research relevant for your work.