Leveraging monolingual data with self-supervision for multilingual neural machine translation

Aditya Siddhant; Ankur Bapna; Yuan Cao; Orhan Firat; Mia Chen; Sneha Kudungunta; Naveen Arivazhagan; Yonghui Wu

Conference ProceedingsOPEN ACCESS

Leveraging monolingual data with self-supervision for multilingual neural machine translation

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2020) 2827-2835

DOI: 10.18653/v1/2020.acl-main.252

53Citations

168Readers

Abstract

Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with self-supervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on WMT ro-en translation without any parallel data or back-translation.

Cite

CITATION STYLE

APA

Siddhant, A., Bapna, A., Cao, Y., Firat, O., Chen, M., Kudungunta, S., … Wu, Y. (2020). Leveraging monolingual data with self-supervision for multilingual neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2827–2835). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.252

Leveraging monolingual data with self-supervision for multilingual neural machine translation

Abstract

Cite

Register to see more suggestions