Part-of-speech taggers for low-resource languages using CCA features

Young Bum Kim; Benjamin Snyder; Ruhi Sarikaya

Conference ProceedingsOPEN ACCESS

Part-of-speech taggers for low-resource languages using CCA features

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 1292-1302

DOI: 10.18653/v1/d15-1150

13Citations

108Readers

Abstract

In this paper, we address the challenge of creating accurate and robust partof-speech taggers for low-resource languages. We propose a method that leverages existing parallel data between the target language and a large set of resourcerich languages without ancillary resources such as tag dictionaries. Crucially, we use CCA to induce latent word representations that incorporate cross-genre distributional cues, as well as projected tags from a full array of resource-rich languages. We develop a probability-based confidence model to identify words with highly likely tag projections and use these words to train a multi-class SVM using the CCA features. Our method yields average performance of 85% accuracy for languages with almost no resources, outperforming a state-of-the-art partiallyobserved CRF model.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Kim, Y. B., Snyder, B., & Sarikaya, R. (2015). Part-of-speech taggers for low-resource languages using CCA features. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1292–1302). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1150

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 40

69%

Researcher 12

21%

Lecturer / Post doc 4

Professor / Associate Prof. 2

Readers' Discipline

Computer Science 54

81%

Linguistics 10

15%

Business, Management and Accounting 2

Nursing and Health Professions 1

Part-of-speech taggers for low-resource languages using CCA features

Abstract

References Powered by Scopus

On the learnability and design of output codes for multiclass problems

Alignment by agreement

Multi-view regression via canonical correlation analysis

Cited by Powered by Scopus

Cross-lingual transfer learning for POS tagging without cross-lingual resources

Ten pairs to tag - Multilingual POS tagging via coarse mapping between embeddings

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline