Read, spot and translate

Lucia Specia; Josiah Wang; Sun Jae Lee; Alissa Ostapenko; Pranava Madhyastha

Journal ArticleOPEN ACCESS

Read, spot and translate

Machine Translation (2021) 35(2) 145-165

DOI: 10.1007/s10590-021-09259-z

1Citations

9Readers

Abstract

We propose multimodal machine translation (MMT) approaches that exploit the correspondences between words and image regions. In contrast to existing work, our referential grounding method considers objects as the visual unit for grounding, rather than whole images or abstract image regions, and performs visual grounding in the source language, rather than at the decoding stage via attention. We explore two referential grounding approaches: (i) implicit grounding, where the model jointly learns how to ground the source language in the visual representation and to translate; and (ii) explicit grounding, where grounding is performed independent of the translation model, and is subsequently used to guide machine translation. We performed experiments on the Multi30K dataset for three language pairs: English–German, English–French and English–Czech. Our referential grounding models outperform existing MMT models according to automatic and human evaluation metrics.

Author supplied keywords

Cite

CITATION STYLE

APA

Specia, L., Wang, J., Lee, S. J., Ostapenko, A., & Madhyastha, P. (2021). Read, spot and translate. Machine Translation, 35(2), 145–165. https://doi.org/10.1007/s10590-021-09259-z

Read, spot and translate

Abstract

Author supplied keywords

Cite

Register to see more suggestions