Part-of-speech taggers for low-resource languages using CCA features

13Citations
Citations of this article
108Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we address the challenge of creating accurate and robust partof-speech taggers for low-resource languages. We propose a method that leverages existing parallel data between the target language and a large set of resourcerich languages without ancillary resources such as tag dictionaries. Crucially, we use CCA to induce latent word representations that incorporate cross-genre distributional cues, as well as projected tags from a full array of resource-rich languages. We develop a probability-based confidence model to identify words with highly likely tag projections and use these words to train a multi-class SVM using the CCA features. Our method yields average performance of 85% accuracy for languages with almost no resources, outperforming a state-of-the-art partiallyobserved CRF model.

References Powered by Scopus

On the learnability and design of output codes for multiclass problems

398Citations
N/AReaders
Get full text

Alignment by agreement

336Citations
N/AReaders
Get full text

Multi-view regression via canonical correlation analysis

132Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Cross-lingual transfer learning for POS tagging without cross-lingual resources

106Citations
N/AReaders
Get full text

Ten pairs to tag - Multilingual POS tagging via coarse mapping between embeddings

87Citations
N/AReaders
Get full text

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

44Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Kim, Y. B., Snyder, B., & Sarikaya, R. (2015). Part-of-speech taggers for low-resource languages using CCA features. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1292–1302). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1150

Readers over time

‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24‘2505101520

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 40

69%

Researcher 12

21%

Lecturer / Post doc 4

7%

Professor / Associate Prof. 2

3%

Readers' Discipline

Tooltip

Computer Science 54

81%

Linguistics 10

15%

Business, Management and Accounting 2

3%

Nursing and Health Professions 1

1%

Save time finding and organizing research with Mendeley

Sign up for free
0