We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel-lan-guages for which nothing but a translation of parts of the Bible exists. By aggre-gating over the tags from a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for 100 languages, using the lan-guages to bootstrap each other. We eval-uate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag dic-tionaries. Our approach performs much better (20-30%) than state-of-the-art unsu-pervised POS taggers induced from Bible translations, and is often competitive with weakly supervised approaches that assume high-quality parallel corpora, representa-tive monolingual corpora with perfect to-kenization, and/or tag dictionaries. We make models for all 100 languages avail-able.
CITATION STYLE
Agić, Ž., Hovy, D., & Søgaard, A. (2015). If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 268–272). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2044
Mendeley helps you to discover research relevant for your work.