Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plug- and-play, meaning it can be easily applied to existing image-text retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.

Cite

CITATION STYLE

APA

Huang, H., Nie, Z., Wang, Z., & Shang, Z. (2024). Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 18298–18306). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i16.29789

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free