Video-Helpful Multimodal Machine Translation

1Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/videohelpful-MMT.git.

References Powered by Scopus

Quo Vadis, action recognition? A new model and the kinetics dataset

6751Citations
N/AReaders
Get full text

CIDEr: Consensus-based image description evaluation

3725Citations
N/AReaders
Get full text

A Call for Clarity in Reporting BLEU Scores

2023Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Li, Y., Shimizu, S., Chu, C., Kurohashi, S., & Li, W. (2023). Video-Helpful Multimodal Machine Translation. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 4281–4299). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.260

Readers over time

‘23‘24‘2502468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

67%

Lecturer / Post doc 1

33%

Readers' Discipline

Tooltip

Computer Science 4

80%

Medicine and Dentistry 1

20%

Save time finding and organizing research with Mendeley

Sign up for free
0