Multimodal Robustness for Neural Machine Translation

6Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we look at the case of a Generic text-to-text NMT model that has to deal with data coming from various modalities, like speech, images, or noisy text extracted from the web. We propose a two-step method, based on composable adapters, to deal with this problem of Multimodal Robustness. In the first step, we separately learn domain adapters and modality specific adapters, to deal with noisy input coming from various sources: ASR, OCR, or noisy text (UGC). In a second step, we combine these components at runtime via dynamic routing or, when the source of noise is unknown, via two new transfer learning mechanisms (Fast Fusion and Multi Fusion). We show that our method provides a flexible, state-of-the-art, architecture able to deal with noisy multimodal inputs.

Cite

CITATION STYLE

APA

Zhao, Y., & Calapodescu, I. (2022). Multimodal Robustness for Neural Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 8505–8516). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.582

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free