2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

Jingjing Guo; Jing Yu; Yuhang Lu; Yue Hu; Yanbing Liu

Conference ProceedingsOPEN ACCESS

2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

Guo J
Yu J
Lu Y
et al.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11537 LNCS 131-144

DOI: 10.1007/978-3-030-22741-8_10

0Citations

4Readers

Abstract

Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the “heterogeneous gap” between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Guo, J., Yu, J., Lu, Y., Hu, Y., & Liu, Y. (2019). 2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11537 LNCS, pp. 131–144). Springer Verlag. https://doi.org/10.1007/978-3-030-22741-8_10

2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions