2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the “heterogeneous gap” between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.

Cite

CITATION STYLE

APA

Guo, J., Yu, J., Lu, Y., Hu, Y., & Liu, Y. (2019). 2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11537 LNCS, pp. 131–144). Springer Verlag. https://doi.org/10.1007/978-3-030-22741-8_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free