Learning Robust Dense Retrieval Models from Incomplete Relevance Labels

Prafull Prakash; Julian Killingback; Hamed Zamani

Conference ProceedingsOPEN ACCESS

Learning Robust Dense Retrieval Models from Incomplete Relevance Labels

SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021) 1728-1732

DOI: 10.1145/3404835.3463106

23Citations

15Readers

Get full text

Abstract

Recent deployment of efficient billion-scale approximate nearest neighbor (ANN) search algorithms on GPUs has motivated information retrieval researchers to develop neural ranking models that learn low-dimensional dense representations for queries and documents and use ANN search for retrieval. However, optimizing these dense retrieval models poses several challenges including negative sampling for (pair-wise) training. A recent model, called ANCE, successfully uses dynamic negative sampling using ANN search. This paper improves upon ANCE by proposing a robust negative sampling strategy for scenarios where the training data lacks complete relevance annotations. This is of particular importance as obtaining large-scale training data with complete relevance judgment is extremely expensive. Our model uses a small validation set with complete relevance judgments to accurately estimate a negative sampling distribution for dense retrieval models. We also explore leveraging a lexical matching signal during training and pseudo-relevance feedback during evaluation for improved performance. Our experiments on the TREC Deep Learning Track benchmarks demonstrate the effectiveness of our solutions.

Author supplied keywords

Cite

CITATION STYLE

APA

Prakash, P., Killingback, J., & Zamani, H. (2021). Learning Robust Dense Retrieval Models from Incomplete Relevance Labels. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1728–1732). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463106

Learning Robust Dense Retrieval Models from Incomplete Relevance Labels

Abstract

Author supplied keywords

Cite

Register to see more suggestions