Noisy Pair Corrector for Dense Retrieval

2Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an effective model with mismatched-pair noise. To solve this problem, we propose a novel approach called Noisy Pair Corrector (NPC), which consists of a detection module and a correction module. The detection module estimates noise pairs by calculating the perplexity between annotated positive and easy negative documents. The correction module utilizes an exponential moving average (EMA) model to provide a soft supervised signal, aiding in mitigating the effects of noise. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS. Experimental results show that NPC achieves excellent performance in handling both synthetic and realistic noise.

Cite

CITATION STYLE

APA

Zhang, H., Gong, Y., He, X., Liu, D., Guo, D., Lv, J., & Guo, J. (2023). Noisy Pair Corrector for Dense Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 11439–11451). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.765

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free