CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

9Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and achieve state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need more space to store their indexes compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route each token vector to the predicted lexical “keys” such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Source code and data are available at https://github.com/facebookresearch/dpr-scale/tree/citadel.

Cite

CITATION STYLE

APA

Li, M., Lin, S. C., Oguz, B., Ghoshal, A., Lin, J., Mehdad, Y., … Chen, X. (2023). CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 11891–11907). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.663

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free