Cross-modal retrieval via memory network

Ge Song; Xiaoyang Tan

Conference ProceedingsOPEN ACCESS

Cross-modal retrieval via memory network

British Machine Vision Conference 2017, BMVC 2017 (2017)

DOI: 10.5244/c.31.178

3Citations

8Readers

Get full text

Abstract

With the explosive growth of multimedia data on the Internet, cross-modal retrieval has attracted a great deal of attention in computer vision and multimedia community. However, this task is very challenging due to the heterogeneity gap between different modalities. Current approaches typically involve a common representation learning process that maps different data into a common space by linear or nonlinear functions. Yet most of them 1) only handle the dual-modal situation and generalize poorly to complex cases; 2) require example-level alignment of training data, which is often prohibitively expensive in practical applications; and 3) do not fully exploit prior knowledge about different modalities during the mapping process. In this paper, we address above issues by casting common representation learning as a Question Answer problem via a cross-modal memory neural network (CMMN). Specifically, raw features of all modalities are seemed as’Question’, and extra discriminator is exploited to select high-quality ones as’Statements’ for storage whereby common features are desired’Answer’. Experimental results show that CMMN can achieve state-of-the-art performance on the Wiki and COCO dataset and outperform other baselines on the large-scale scene dataset CMPlaces.

Cite

CITATION STYLE

APA

Song, G., & Tan, X. (2017). Cross-modal retrieval via memory network. In British Machine Vision Conference 2017, BMVC 2017. BMVA Press. https://doi.org/10.5244/c.31.178

Cross-modal retrieval via memory network

Abstract

Cite

Register to see more suggestions