Abstract
While pre-training techniques are working very well in natural language processing, how to pre-train a decoder and effectively leverage it for neural machine translation (NMT) still remains a tricky issue. The main reason is that the cross-attention module between the encoder and decoder cannot be pre-trained, and the combined encoder-decoder model cannot work well in the fine-tuning stage because the inputs of the decoder cross-attention come from unknown encoder outputs. In this paper, we propose a better pre-training method for NMT by defining a semantic interface (SemFace) between the pre-trained encoder and the pre-trained decoder. Specifically, we propose two types of semantic interfaces, including CL-SemFace which regards cross-lingual embeddings as an interface, and VQ-SemFace which employs vector quantized embeddings to constrain the encoder outputs and decoder inputs in the same language-independent space. We conduct massive experiments on six supervised translation pairs and three unsupervised pairs. Experimental results demonstrate that our proposed SemFace can effectively connect the pre-trained encoder and decoder, and achieves significant improvement by 3.7 and 1.5 BLEU points on the two tasks respectively compared with previous pre-training-based NMT models.
Cite
CITATION STYLE
Ren, S., Zhou, L., Liu, S., Wei, F., Zhou, M., & Ma, S. (2021). Semface: Pre-training encoder and decoder with a semantic interface for neural machine translation. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 4518–4527). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.348
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.