Cross-modal retrieval via memory network

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the explosive growth of multimedia data on the Internet, cross-modal retrieval has attracted a great deal of attention in computer vision and multimedia community. However, this task is very challenging due to the heterogeneity gap between different modalities. Current approaches typically involve a common representation learning process that maps different data into a common space by linear or nonlinear functions. Yet most of them 1) only handle the dual-modal situation and generalize poorly to complex cases; 2) require example-level alignment of training data, which is often prohibitively expensive in practical applications; and 3) do not fully exploit prior knowledge about different modalities during the mapping process. In this paper, we address above issues by casting common representation learning as a Question Answer problem via a cross-modal memory neural network (CMMN). Specifically, raw features of all modalities are seemed as’Question’, and extra discriminator is exploited to select high-quality ones as’Statements’ for storage whereby common features are desired’Answer’. Experimental results show that CMMN can achieve state-of-the-art performance on the Wiki and COCO dataset and outperform other baselines on the large-scale scene dataset CMPlaces.

Cite

CITATION STYLE

APA

Song, G., & Tan, X. (2017). Cross-modal retrieval via memory network. In British Machine Vision Conference 2017, BMVC 2017. BMVA Press. https://doi.org/10.5244/c.31.178

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free