Cross-media retrieval arouses considerable attentions and becomes a more and more worthwhile research direction in the domain of information retrieval. Different from many related works which perform retrieval by mapping heterogeneous data into a common representation subspace using a couple of projection matrices, we input multi-modal media data into a model of neural network which utilize a deep sparse neural network pre-trained by restricted Boltzmann machines and output their semantic understanding for semantic matching (RSNN-SM). Consequently, the heterogeneous modality data are represented by their top-level semantic outputs, and cross-media retrieval is performed by measuring their semantic similarities. Experimental results on several real-world datasets show that, RSNN-SM obtains the best performance and outperforms the state-of-the-art approaches.
CITATION STYLE
Zhang, B., Zhang, H., Sun, J., Wang, Z., Wu, H., & Dong, X. (2018). Cross-media semantic matching via sparse neural network pre-trained by deep restricted boltzmann machines. In Communications in Computer and Information Science (Vol. 819, pp. 280–289). Springer Verlag. https://doi.org/10.1007/978-981-10-8530-7_27
Mendeley helps you to discover research relevant for your work.