Deep hashing methods have achieved tremendous success in cross-modal retrieval, due to its low storage consumption and fast retrieval speed. In real cross-modal retrieval applications, it's hard to obtain label information. Recently, increasing attention has been paid to unsupervised cross-modal hashing. However, existing methods fail to exploit the intrinsic connections between images and their corresponding descriptions or tags (text modality). In this paper, we propose a novel Deep Semantic-Alignment Hashing (DSAH) for unsupervised cross-modal retrieval, which sufficiently utilizes the co-occurred image-text pairs. DSAH explores the similarity information of different modalities and we elaborately design a semantic-alignment loss function, which elegantly aligns the similarities between features with those between hash codes. Moreover, to further bridge the modality gap, we innovatively propose to reconstruct features of one modality with hash codes of the other one. Extensive experiments on three cross-modal retrieval datasets demonstrate that DSAH achieves the state-of-the-art performance.
CITATION STYLE
Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., & Wang, W. (2020). Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 44–52). Association for Computing Machinery, Inc. https://doi.org/10.1145/3372278.3390673
Mendeley helps you to discover research relevant for your work.