Universal Conditional Masked Language Pre-training for Neural Machine Translation

22Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low- to extremely high-resource languages, i.e., up to +14.4 BLEU on low-resource and +7.9 BLEU on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the best of our knowledge, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks.

Cite

CITATION STYLE

APA

Li, P., Li, L., Zhang, M., Wu, M., & Liu, Q. (2022). Universal Conditional Masked Language Pre-training for Neural Machine Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 6379–6391). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.442

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free