Multi-hop Interactive Cross-Modal Retrieval

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Conventional representation learning based cross-modal retrieval approaches always represent the sentence with a global embedding feature, which easily neglects the local correlations between objects in the image and phrases in the sentence. In this paper, we present a novel Multi-hop Interactive Cross-modal Retrieval Model (MICRM), which interactively exploits the local correlations between images and words. We design a multi-hop interactive module to infer the high-order relevance between the image and the sentence. Experimental results on two benchmark datasets, MS-COCO and Flickr30K, demonstrate that our multi-hop interactive model performs significantly better than several competitive cross-modal retrieval methods.

Cite

CITATION STYLE

APA

Ning, X., Yang, X., & Xu, C. (2020). Multi-hop Interactive Cross-Modal Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11962 LNCS, pp. 681–693). Springer. https://doi.org/10.1007/978-3-030-37734-2_55

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free