Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

13Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent work has questioned the necessity of visual information in Multimodal Machine Translation (MMT). This paper tries to answer this question and build a new benchmark in this work. As the available dataset is simple and the text input is self-sufficient, we introduce a challenging dataset called EMMT, whose testset is deliberately designed to ensure ambiguity. More importantly, we study this problem in a real-word scenario towards making the most of multimodal training data. We propose a new framework 2/3-Triplet which can naturally make full use of large-scale image-text and parallel text-only data. Extensive experiments show that visual information is highly crucial in EMMT. The proposed 2/3-Triplet outperforms the strong text-only competitor by 3.8 BLEU score, and even bypasses a commercial translation system.

Cite

CITATION STYLE

APA

Zhu, Y., Sun, Z., Cheng, S., Huang, L., Wu, L., & Wang, M. (2023). Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2679–2697). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.168

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free