Cross-modal retrieval with discriminative dual-path CNN

N/ACitations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Cross-modal retrieval aims at searching semantically similar examples in one modality by using a query from another modality. Its typical applications including image-based text retrieval (IBTR) and text-based image retrieval (TBIR). Due to the rapid growth of multimodal data and the success of deep learning, cross-modal retrieval has received increasing attention and achieved significant progress in recent years. Dual-path CNN is a novel framework in this domain, which yields competitive performance by utilizing instance loss and inter-modal loss. However, it is still less discriminative in modeling the intra-modal relationship, which is also important in bridging a more discriminative cross-modal embedding network. To this end, we propose to incorporate an additional intra-modal loss into the framework to remedy this problem by preserving the intra-modal structure. Further, we develop a novel batch flexible sampling approach to train the entire network effectively and efficiently. Our approach, named Discriminative Dual-Path CNN (DDPC), achieves the state-of-the-art results on the MS-COCO dataset, improving IBTR by 4.9% and TBIR by 5.9% based on Recall@1 on the 5K test set.

Cite

CITATION STYLE

APA

Wang, H., Ji, Z., & Pang, Y. (2018). Cross-modal retrieval with discriminative dual-path CNN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11166 LNCS, pp. 384–394). Springer Verlag. https://doi.org/10.1007/978-3-030-00764-5_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free