On the Copying Behaviors of Pre-Training for Neural Machine Translation

23Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.

Abstract

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

Cite

CITATION STYLE

APA

Liu, X., Wang, L., Wong, D. F., Ding, L., Chao, L. S., Shi, S., & Tu, Z. (2021). On the Copying Behaviors of Pre-Training for Neural Machine Translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4265–4275). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.373

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free