Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Yaoming Zhu; Zewei Sun; Shanbo Cheng; Luyang Huang; Liwei Wu; Mingxuan Wang

Conference Proceedings

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 2679-2697

DOI: 10.18653/v1/2023.findings-acl.168

13Citations

21Readers

Get full text

Abstract

Recent work has questioned the necessity of visual information in Multimodal Machine Translation (MMT). This paper tries to answer this question and build a new benchmark in this work. As the available dataset is simple and the text input is self-sufficient, we introduce a challenging dataset called EMMT, whose testset is deliberately designed to ensure ambiguity. More importantly, we study this problem in a real-word scenario towards making the most of multimodal training data. We propose a new framework 2/3-Triplet which can naturally make full use of large-scale image-text and parallel text-only data. Extensive experiments show that visual information is highly crucial in EMMT. The proposed 2/3-Triplet outperforms the strong text-only competitor by 3.8 BLEU score, and even bypasses a commercial translation system.

Cite

CITATION STYLE

APA

Zhu, Y., Sun, Z., Cheng, S., Huang, L., Wu, L., & Wang, M. (2023). Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2679–2697). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.168

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Abstract

Cite

Register to see more suggestions