SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval

13Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In dense retrieval, prior work has largely improved retrieval effectiveness using multi-vector dense representations, exemplified by ColBERT. In sparse retrieval, more recent work, such as SPLADE, demonstrated that one can also learn sparse lexical representations to achieve comparable effectiveness while enjoying better interpretability. In this work, we combine the strengths of both the sparse and dense representations for first-stage retrieval. Specifically, we propose SparseEmbed - a novel retrieval model that learns sparse lexical representations with contextual embeddings. Compared with SPLADE, our model leverages the contextual embeddings to improve model expressiveness. Compared with ColBERT, our sparse representations are trained end-to-end to optimize both efficiency and effectiveness.

Cite

CITATION STYLE

APA

Kong, W., Dudek, J. M., Li, C., Zhang, M., & Bendersky, M. (2023). SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval. In SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2399–2403). Association for Computing Machinery, Inc. https://doi.org/10.1145/3539618.3592065

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free