Part-of-speech taggers for low-resource languages using CCA features

13Citations
Citations of this article
105Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we address the challenge of creating accurate and robust partof-speech taggers for low-resource languages. We propose a method that leverages existing parallel data between the target language and a large set of resourcerich languages without ancillary resources such as tag dictionaries. Crucially, we use CCA to induce latent word representations that incorporate cross-genre distributional cues, as well as projected tags from a full array of resource-rich languages. We develop a probability-based confidence model to identify words with highly likely tag projections and use these words to train a multi-class SVM using the CCA features. Our method yields average performance of 85% accuracy for languages with almost no resources, outperforming a state-of-the-art partiallyobserved CRF model.

Cite

CITATION STYLE

APA

Kim, Y. B., Snyder, B., & Sarikaya, R. (2015). Part-of-speech taggers for low-resource languages using CCA features. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1292–1302). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1150

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free