XLM-D: Decorate Cross-lingual Pre-training Model as Non-Autoregressive Neural Machine Translation

8Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Pre-training language models have achieved thriving success in numerous natural language understanding and autoregressive generation tasks, but non-autoregressive generation in applications such as machine translation has not sufficiently benefited from the pre-training paradigm. In this work, we establish the connection between a pre-trained masked language model (MLM) and non-autoregressive generation on machine translation. From this perspective, we present XLM-D, which seamlessly transforms an off-the-shelf cross-lingual pre-training model into a non-autoregressive translation (NAT) model with a lightweight yet effective decorator. Specifically, the decorator ensures the representation consistency of the pre-trained model and brings only one additional trainable parameter. Extensive experiments on typical translation datasets show that our models obtain state-of-the-art performance while realizing the inference speedup by 19.9×. One striking result is that on WMT14 En⇒De, our XLM-D obtains 29.80 BLEU points with multiple iterations, which outperforms the previous mask-predict model by 2.77 points.

Cite

CITATION STYLE

APA

Wang, Y., He, S., Chen, G., Chen, Y., & Jiang, D. (2022). XLM-D: Decorate Cross-lingual Pre-training Model as Non-Autoregressive Neural Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 6934–6946). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.466

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free