If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages

Željko Agić; Dirk Hovy; Anders Søgaard

Conference ProceedingsOPEN ACCESS

If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015) 2 268-272

DOI: 10.3115/v1/p15-2044

56Citations

133Readers

Abstract

We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel-lan-guages for which nothing but a translation of parts of the Bible exists. By aggre-gating over the tags from a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for 100 languages, using the lan-guages to bootstrap each other. We eval-uate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag dic-tionaries. Our approach performs much better (20-30%) than state-of-the-art unsu-pervised POS taggers induced from Bible translations, and is often competitive with weakly supervised approaches that assume high-quality parallel corpora, representa-tive monolingual corpora with perfect to-kenization, and/or tag dictionaries. We make models for all 100 languages avail-able.

Cite

CITATION STYLE

APA

Agić, Ž., Hovy, D., & Søgaard, A. (2015). If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 268–272). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2044

If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages

Abstract

Cite

Register to see more suggestions