Improving what cross-modal retrieval models learn through object-oriented inter- and intra-modal attention networks

13Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Although significant progress has been made for cross-modal retrieval models in recent years, few have explored what those models truly learn and what makes one model superior to another. Start by training two state-of-the-art text-to-image retrieval models with adversarial text inputs, we investigate and quantify the importance of syntactic structure and lexical information in learning the joint visual-semantic embedding space for cross-modal retrieval. The results show that the retrieval power mainly comes from localizing and connecting the visual objects and their cross-modal counterparts, the textual phrases. Inspired by this observation, we propose a novel model which employs object-oriented encoders along with inter- and intra-modal attention networks to improve inter-modal dependencies for cross-modal retrieval. In addition, we develop a new multimodal structure-preserving objective which additionally emphasizes intra-modal hard negative examples to promote intra-modal discrepancies. Extensive experiments show that the proposed approach outperforms the existing best method by a large margin (16.4% and 6.7% relatively with Recall@1 in the text-toimage retrieval task on the Flickr30K dataset and the MS-COCO dataset respectively).

Cite

CITATION STYLE

APA

Huang, P. Y., Vaibhav, Chang, X., & Hauptmann, A. G. (2019). Improving what cross-modal retrieval models learn through object-oriented inter- and intra-modal attention networks. In ICMR 2019 - Proceedings of the 2019 ACM International Conference on Multimedia Retrieval (pp. 244–252). Association for Computing Machinery, Inc. https://doi.org/10.1145/3323873.3325043

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free