Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation

26Citations
Citations of this article
49Readers
Mendeley users who have this article in their library.

Abstract

Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide an appealing alternative for NAT - monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data. Monolingual KD is able to transfer both the knowledge of the original bilingual data (implicitly encoded in the trained AT teacher model) and that of the new monolingual data to the NAT student model. Extensive experiments on eight WMT benchmarks over two advanced NAT models show that monolingual KD consistently outperforms the standard KD by improving low-frequency word translation, without introducing any computational cost. Monolingual KD enjoys desirable expandability, which can be further enhanced (when given more computational budget) by combining with the standard KD, a reverse monolingual KD, or enlarging the scale of monolingual data. Extensive analyses demonstrate that these techniques can be used together profitably to further recall the useful information lost in the standard KD. Encouragingly, combining with standard KD, our approach achieves 30.4 and 34.1 BLEU points on the WMT14 English-German and German-English datasets, respectively. Our code and trained models are freely available at https://github.com/alphadl/RLFW-NAT.mono.

Cite

CITATION STYLE

APA

Ding, L., Wang, L., Shi, S., Tao, D., & Tu, Z. (2022). Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 2417–2426). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.172

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free