Comprehensive distance-preserving autoencoders for cross-modal retrieval

Yibing Zhan; Rong Zhang; Jun Yu; Dacheng Tao; Zhou Yu; Qi Tian

Conference Proceedings

Comprehensive distance-preserving autoencoders for cross-modal retrieval

MM 2018 - Proceedings of the 2018 ACM Multimedia Conference (2018) 1137-1145

DOI: 10.1145/3240508.3240607

35Citations

32Readers

Get full text

Abstract

In this paper, we propose a novel method with comprehensive distance-preserving autoencoders (CDPAE) to address the problem of unsupervised cross-modal retrieval. Previous unsupervised methods rely primarily on pairwise distances of representations extracted from cross media spaces that co-occur and belong to the same objects. However, besides pairwise distances, the CDPAE also considers heterogeneous distances of representations extracted from cross media spaces as well as homogeneous distances of representations extracted from single media spaces that belong to different objects. The CDPAE consists of four components. First, denoising autoencoders are used to retain the information from the representations and to reduce the negative influence of redundant noises. Second, a comprehensive distance-preserving common space is proposed to explore the correlations among different representations. This aims to preserve the respective distances between the representations within the common space so that they are consistent with the distances in their original media spaces. Third, a novel joint loss function is defined to simultaneously calculate the reconstruction loss of the denoising autoencoders and the correlation loss of the comprehensive distance-preserving common space. Finally, an unsupervised cross-modal similarity measurement is proposed to further improve the retrieval performance. This is carried out by calculating the marginal probability of two media objects based on a kNN classifier. The CDPAE is tested on four public datasets with two cross-modal retrieval tasks: “query images by texts” and “query texts by images”. Compared with eight state-of-the-art cross-modal retrieval methods, the experimental results demonstrate that the CDPAE outperforms all the unsupervised methods and performs competitively with the supervised methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhan, Y., Zhang, R., Yu, J., Tao, D., Yu, Z., & Tian, Q. (2018). Comprehensive distance-preserving autoencoders for cross-modal retrieval. In MM 2018 - Proceedings of the 2018 ACM Multimedia Conference (pp. 1137–1145). Association for Computing Machinery, Inc. https://doi.org/10.1145/3240508.3240607

Comprehensive distance-preserving autoencoders for cross-modal retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions