Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

9Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper outlines a supervised approach to language identification in code-switched data, framing this as a sequence labeling task where the label of each token is identified using a classifier based on Conditional Random Fields and trained on a range of different features, extracted both from the training data and by using information from Babelnet and Babelfy. The method was tested on the development dataset provided by organizers of the shared task on language identification in code-switched data, obtaining tweet level monolingual, code-switched and weighted F1-scores of 94%, 85% and 91%, respectively, with a token level accuracy of 95.8%. When evaluated on the unseen test data, the system achieved 90%, 85% and 87.4% monolingual, code-switched and weighted tweet level F1-scores, and a token level accuracy of 95.7%.

Cite

CITATION STYLE

APA

Sikdar, U. K., & Gambäck, B. (2016). Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet. In EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (pp. 127–131). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5817

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free