Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling

Utsab Barman; Joachim Wagner; Jennifer Foster

Conference Proceedings

Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling

EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (2016) 30-39

DOI: 10.18653/v1/w16-5804

20Citations

95Readers

Get full text

Abstract

Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman et al., 2014) with part-of-speech (POS) tags. We investigate two state-of-the-art POS tagging techniques for code-mixed content and combine the features of the two systems to build a better POS tagger. Furthermore, we investigate the use of a joint model which performs language identification (LID) and part-of-speech (POS) tagging simultaneously.

Cite

CITATION STYLE

APA

Barman, U., Wagner, J., & Foster, J. (2016). Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling. In EMNLP 2016 - 2nd Workshop on Computational Approaches to Code Switching, CS 2016 - Proceedings of the Workshop (pp. 30–39). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5804

Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling

Abstract

Cite

Register to see more suggestions