LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

72Citations
Citations of this article
111Readers
Mendeley users who have this article in their library.

Abstract

Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter the practical use of pre-trained models. In this paper, we study Image-text retrieval (ITR), the most mature scenario of V+L application, which has been widely studied even prior to the emergence of recent pre-trained models. We propose a simple yet highly effective approach, LightningDOT that accelerates the inference time of ITR by thousands of times, without sacrificing accuracy. LightningDOT removes the time-consuming cross-modal attention by pre-training on three novel learning objectives, extracting feature indexes offline, and employing instant dot-product matching with further re-ranking, which significantly speeds up retrieval process. In fact, LightningDOT achieves new state of the art across multiple ITR benchmarks such as Flickr30k, COCO and Multi30K, outperforming existing pre-trained models that consume 1000× magnitude of computational hours.

Cite

CITATION STYLE

APA

Sun, S., Chen, Y. C., Li, L., Wang, S., Fang, Y., & Liu, J. (2021). LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 982–997). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.77

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free