Cross-modal image-text retrieval with multitask learning

21Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose a multi-task learning approach for cross-modal image-text retrieval. First, a correlation network is proposed for relation recognition task, which helps learn the complicated relations and common information of different modalities. Then, we propose a correspondence cross-modal autoencoder for cross-modal input reconstruction task, which helps correlate the hidden representations of two uni-modal autoencoders. In addition, to further improve the performance of cross-modal retrieval, two regularization terms (variance and consistency constraints) are introduced to the cross-modal embeddings such that the learned common information has large variance and is modality invariant. Finally, to enable large-scale cross-modal similarity search, a flexible binary transform network is designed to convert the text and image embeddings into binary codes. Extensive experiments on two benchmark datasets demonstrate that our model has robust superiority over the compared strong baseline methods. Source code is available at https://github.com/daerv/DAEVR.

References Powered by Scopus

Microsoft COCO: Common objects in context

29304Citations
N/AReaders
Get full text

Canonical correlation analysis: An overview with application to learning methods

2632Citations
N/AReaders
Get full text

The MIR Flickr retrieval evaluation

1608Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Comparative analysis on cross-modal information retrieval: A review

81Citations
N/AReaders
Get full text

Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval

60Citations
N/AReaders
Get full text

Improving Cross-Modal Image-Text Retrieval with Teacher-Student Learning

38Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Luo, J., Shen, Y., Ao, X., Zhao, Z., & Yang, M. (2019). Cross-modal image-text retrieval with multitask learning. In International Conference on Information and Knowledge Management, Proceedings (pp. 2309–2312). Association for Computing Machinery. https://doi.org/10.1145/3357384.3358104

Readers over time

‘19‘20‘21‘22‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 8

67%

Professor / Associate Prof. 2

17%

Lecturer / Post doc 1

8%

Researcher 1

8%

Readers' Discipline

Tooltip

Computer Science 11

85%

Business, Management and Accounting 1

8%

Arts and Humanities 1

8%

Save time finding and organizing research with Mendeley

Sign up for free
0