SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Eunseong Choi; Sunkyung Lee; Minijn Choi; Hyeseon Ko; Young In Song; Jongwuk Lee

Conference ProceedingsOPEN ACCESS

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

International Conference on Information and Knowledge Management, Proceedings (2022) 272-282

DOI: 10.1145/3511808.3557456

13Citations

25Readers

Get full text

Abstract

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained language models can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Choi, E., Lee, S., Choi, M., Ko, H., Song, Y. I., & Lee, J. (2022). SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval. In International Conference on Information and Knowledge Management, Proceedings (pp. 272–282). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557456

Readers' Seniority

PhD / Post grad / Masters / Doc 8

80%

Lecturer / Post doc 1

10%

Researcher 1

10%

Readers' Discipline

Computer Science 10

77%

Linguistics 2

15%

Engineering 1

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Abstract

Author supplied keywords

References Powered by Scopus

Natural Questions: A Benchmark for Question Answering Research

Billion-Scale Similarity Search with GPUs

Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval

Cited by Powered by Scopus

Information Retrieval: Recent Advances and beyond

A Unified Framework for Learned Sparse Retrieval

Efficient Document-At-A-Time and Score-At-A-Time Query Evaluation for Learned Sparse Representations

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline