Cross-modal hashing is an effective and practical way for large-scale multimedia retrieval. Unsupervised hashing, which is a strong candidate for cross-modal hashing, has received more attention due to its easy unlabeled data collection. However, although there has been a rich line of such work in academia, they are hindered by a common disadvantage that the training data must exist in pairs to connect different modalities (e.g., a pair of an image and a text, which have the same semantic information), namely, the learning cannot perform with no pair-wise information available. To overcome this limitation, we explore to design a Completely Unsupervised Cross-Modal Hashing (CUCMH) approach with none but numeric features available, i.e., with neither class labels nor pair-wise information. To the best of our knowledge, this is the first work discussing this issue, for which, a novel dual-branch generative adversarial network is proposed. We also introduce the concept that the representation of multimedia data can be separated into content and style manner. The modality representation codes are employed to improve the effectiveness of the generative adversarial learning. Extensive experiments demonstrate the outperformance of CUCMH in completely unsupervised cross-modal hashing tasks and the effectiveness of the method integrating modality representation with semantic information in representation learning.
CITATION STYLE
Duan, J., Zhang, P., & Huang, Z. (2020). Completely Unsupervised Cross-Modal Hashing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12112 LNCS, pp. 178–194). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59410-7_11
Mendeley helps you to discover research relevant for your work.