Annotation efficient cross-modal retrieval with adversarial attentive alignment

17Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

Abstract

Visual-semantic embeddings are central to many multimedia applications such as cross-modal retrieval between visual data and natural language descriptions. Conventionally, learning a joint embedding space relies on large parallel multimodal corpora. Since massive human annotation is expensive to obtain, there is a strong motivation in developing versatile algorithms to learn from large corpora with fewer annotations. In this paper, we propose a novel framework to leverage automatically extracted regional semantics from un-annotated images as additional weak supervision to learn visual-semantic embeddings. The proposed model employs adversarial attentive alignments to close the inherent heterogeneous gaps between annotated and un-annotated portions of visual and textual domains. To demonstrate its superiority, we conduct extensive experiments on sparsely annotated multimodal corpora. The experimental results show that the proposed model outperforms state-of-the-art visual-semantic embedding models by a significant margin for cross-modal retrieval tasks on the sparse Flickr30k and MS-COCO datasets. It is also worth noting that, despite using only 20% of the annotations, the proposed model can achieve competitive performance (Recall at 10 > 80.0% for 1K and > 70.0% for 5K text-to-image retrieval) compared to the benchmarks trained with the complete annotations.

Cite

CITATION STYLE

APA

Huang, P. Y., Kang, G., Liu, W., Chang, X., & Hauptmann, A. G. (2019). Annotation efficient cross-modal retrieval with adversarial attentive alignment. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 1758–1767). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3350894

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free