Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

Yang Yang; Chubing Zhang; Yi Chu Xu; Dianhai Yu; De Chuan Zhan; Jian Yang

Conference Proceedings

Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

IJCAI International Joint Conference on Artificial Intelligence (2021) 3300-3306

DOI: 10.24963/ijcai.2021/454

18Citations

10Readers

Get full text

Abstract

The main challenge of cross-modal retrieval is to learn the consistent embedding for heterogeneous modalities. To solve this problem, traditional label-wise cross-modal approaches usually constrain the inter-modal and intra-modal embedding consistency relying on the label ground-truths. However, the experiments reveal that different modal networks actually have various generalization capacities, thereby end-to-end joint training with consistency loss usually leads to sub-optimal uni-modal model, which in turn affects the learning of consistent embedding. Therefore, in this paper, we argue that what really needed for supervised cross-modal retrieval is a good shared classification model. In other words, we learn the consistent embedding by ensuring the classification performance of each modality on the shared model, without the consistency loss. Specifically, we consider a technique called Semantic Sharing, which directly trains the two modalities interactively by adopting a shared self-attention based classification model. We evaluate the proposed approach on three representative datasets. The results validate that the proposed semantic sharing can consistently boost the performance under NDCG metric.

Cite

CITATION STYLE

APA

Yang, Y., Zhang, C., Xu, Y. C., Yu, D., Zhan, D. C., & Yang, J. (2021). Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective. In IJCAI International Joint Conference on Artificial Intelligence (pp. 3300–3306). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/454

Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

Abstract

Cite

Register to see more suggestions