Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

Utpal Kumar Sikdar; Björn Gambäck

Conference Proceedings

Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (2016) 127-131

DOI: 10.18653/v1/w16-5817

9Citations

75Readers

Get full text

Abstract

The paper outlines a supervised approach to language identification in code-switched data, framing this as a sequence labeling task where the label of each token is identified using a classifier based on Conditional Random Fields and trained on a range of different features, extracted both from the training data and by using information from Babelnet and Babelfy. The method was tested on the development dataset provided by organizers of the shared task on language identification in code-switched data, obtaining tweet level monolingual, code-switched and weighted F1-scores of 94%, 85% and 91%, respectively, with a token level accuracy of 95.8%. When evaluated on the unseen test data, the system achieved 90%, 85% and 87.4% monolingual, code-switched and weighted tweet level F1-scores, and a token level accuracy of 95.7%.

Cite

CITATION STYLE

APA

Sikdar, U. K., & Gambäck, B. (2016). Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet. In EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (pp. 127–131). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5817

Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

Abstract

Cite

Register to see more suggestions