Multi-hop Interactive Cross-Modal Retrieval

Xuecheng Ning; Xiaoshan Yang; Changsheng Xu

Conference Proceedings

Multi-hop Interactive Cross-Modal Retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 11962 LNCS 681-693

DOI: 10.1007/978-3-030-37734-2_55

0Citations

2Readers

Get full text

Abstract

Conventional representation learning based cross-modal retrieval approaches always represent the sentence with a global embedding feature, which easily neglects the local correlations between objects in the image and phrases in the sentence. In this paper, we present a novel Multi-hop Interactive Cross-modal Retrieval Model (MICRM), which interactively exploits the local correlations between images and words. We design a multi-hop interactive module to infer the high-order relevance between the image and the sentence. Experimental results on two benchmark datasets, MS-COCO and Flickr30K, demonstrate that our multi-hop interactive model performs significantly better than several competitive cross-modal retrieval methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Ning, X., Yang, X., & Xu, C. (2020). Multi-hop Interactive Cross-Modal Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11962 LNCS, pp. 681–693). Springer. https://doi.org/10.1007/978-3-030-37734-2_55

Multi-hop Interactive Cross-Modal Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions