Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

15Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information. However, an image-must requirement of the multimodal dataset largely hinders MMT's development - namely that it demands an aligned form of [image, source text, target text]. This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup. Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme. In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input. While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation. In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark.

Cite

CITATION STYLE

APA

Peng, R., Zeng, Y., & Zhao, J. (2022). Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 2379–2390). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.152

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free