Low-Resource Machine Translation Using Cross-Lingual Language Model Pretraining

20Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes UTokyo’s submission to the AmericasNLP 2021 Shared Task on machine translation systems for indigenous languages of the Americas. We present a low-resource machine translation system that improves translation accuracy using cross-lingual language model pretraining. Our system uses an mBART implementation of FAIRSEQ to pretrain on a large set of monolingual data from a diverse set of high-resource languages before finetuning on 10 low-resource indigenous American languages: Aymara, Bribri, Asháninka, Guaraní, Wixarika, Náhuatl, Hñähñu, Quechua, Shipibo-Konibo, and Rarámuri. On average, our system achieved BLEU scores that were 1.64 higher and CHRF scores that were 0.0749 higher than the baseline.

Cite

CITATION STYLE

APA

Zheng, F., Reid, M., Marrese-Taylor, E., & Matsuo, Y. (2021). Low-Resource Machine Translation Using Cross-Lingual Language Model Pretraining. In Proceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 (pp. 234–240). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.americasnlp-1.26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free