Ten pairs to tag - Multilingual POS tagging via coarse mapping between embeddings

87Citations
Citations of this article
107Readers
Mendeley users who have this article in their library.

Abstract

In the absence of annotations in the target language, multilingual models typically draw on extensive parallel resources. In this paper, we demonstrate that accurate multilingual partof-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs. We use the translation pairs to establish a coarse linear isometric (orthonormal) mapping between monolingual embeddings. This enables the supervised source model expressed in terms of embeddings to be used directly on the target language. We further refine the model in an unsupervised manner by initializing and regularizing it to be close to the direct transfer model. Averaged across six languages, our model yields a 37.5% absolute improvement over the monolingual prototypedriven method (Haghighi and Klein, 2006) when using a comparable amount of supervision. Moreover, to highlight key linguistic characteristics of the generated tags, we use them to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model.1

Cite

CITATION STYLE

APA

Zhang, Y., Gaddy, D., Barzilay, R., & Jaakkola, T. (2016). Ten pairs to tag - Multilingual POS tagging via coarse mapping between embeddings. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 1307–1317). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1156

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free