Image pivoting for learning multilingual multimodal representations

Spandana Gella; Rico Sennrich; Frank Keller; Mirella Lapata

Conference ProceedingsOPEN ACCESS

Image pivoting for learning multilingual multimodal representations

EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2017) 2839-2845

DOI: 10.18653/v1/d17-1303

52Citations

135Readers

Abstract

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

Cite

CITATION STYLE

APA

Gella, S., Sennrich, R., Keller, F., & Lapata, M. (2017). Image pivoting for learning multilingual multimodal representations. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 2839–2845). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1303

Image pivoting for learning multilingual multimodal representations

Abstract

Cite

Register to see more suggestions