Improving Low-Resource Neural Machine Translation with Teacher-Free Knowledge Distillation

9Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.

Cite

CITATION STYLE

APA

Zhang, X., Li, X., Yang, Y., & Dong, R. (2020). Improving Low-Resource Neural Machine Translation with Teacher-Free Knowledge Distillation. IEEE Access, 8, 206638–206645. https://doi.org/10.1109/ACCESS.2020.3037821

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free