If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages

46Citations
Citations of this article
129Readers
Mendeley users who have this article in their library.

Abstract

We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel-lan-guages for which nothing but a translation of parts of the Bible exists. By aggre-gating over the tags from a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for 100 languages, using the lan-guages to bootstrap each other. We eval-uate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag dic-tionaries. Our approach performs much better (20-30%) than state-of-the-art unsu-pervised POS taggers induced from Bible translations, and is often competitive with weakly supervised approaches that assume high-quality parallel corpora, representa-tive monolingual corpora with perfect to-kenization, and/or tag dictionaries. We make models for all 100 languages avail-able.

References Powered by Scopus

Using ‘smart’ bilingual projection to feature-tag a monolingual dictionary

3Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A survey of cross-lingual word embedding models

361Citations
N/AReaders
Get full text

The social impact of natural language processing

272Citations
N/AReaders
Get full text

On the role of seed lexicons in learning bilingual word embeddings

78Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Agić, Ž., Hovy, D., & Søgaard, A. (2015). If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 268–272). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2044

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 50

72%

Researcher 11

16%

Professor / Associate Prof. 5

7%

Lecturer / Post doc 3

4%

Readers' Discipline

Tooltip

Computer Science 62

78%

Linguistics 11

14%

Engineering 3

4%

Business, Management and Accounting 3

4%

Save time finding and organizing research with Mendeley

Sign up for free