Image pivoting for learning multilingual multimodal representations

52Citations
Citations of this article
135Readers
Mendeley users who have this article in their library.

Abstract

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

Cite

CITATION STYLE

APA

Gella, S., Sennrich, R., Keller, F., & Lapata, M. (2017). Image pivoting for learning multilingual multimodal representations. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 2839–2845). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1303

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free