Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

0Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri-Spanish, Guarani-Spanish, Quechua-Spanish, and Shipibo-Konibo-Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.

Cite

CITATION STYLE

APA

Ebrahimi, A., McCarthy, A. D., Oncevay, A., Chiruzzo, L., Ortega, J. E., Giménez-Lugo, G. A., … Kann, K. (2023). Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 3894–3908). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.eacl-main.280

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free