Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

18Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The main challenge of cross-modal retrieval is to learn the consistent embedding for heterogeneous modalities. To solve this problem, traditional label-wise cross-modal approaches usually constrain the inter-modal and intra-modal embedding consistency relying on the label ground-truths. However, the experiments reveal that different modal networks actually have various generalization capacities, thereby end-to-end joint training with consistency loss usually leads to sub-optimal uni-modal model, which in turn affects the learning of consistent embedding. Therefore, in this paper, we argue that what really needed for supervised cross-modal retrieval is a good shared classification model. In other words, we learn the consistent embedding by ensuring the classification performance of each modality on the shared model, without the consistency loss. Specifically, we consider a technique called Semantic Sharing, which directly trains the two modalities interactively by adopting a shared self-attention based classification model. We evaluate the proposed approach on three representative datasets. The results validate that the proposed semantic sharing can consistently boost the performance under NDCG metric.

Cite

CITATION STYLE

APA

Yang, Y., Zhang, C., Xu, Y. C., Yu, D., Zhan, D. C., & Yang, J. (2021). Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective. In IJCAI International Joint Conference on Artificial Intelligence (pp. 3300–3306). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/454

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free