Synthetic Target Domain Supervision for Open Retrieval QA

Revanth Gangi Reddy; Bhavani Iyer; Md Arafat Sultan; Rong Zhang; Avirup Sil; Vittorio Castelli; Radu Florian; Salim Roukos

Conference ProceedingsOPEN ACCESS

Synthetic Target Domain Supervision for Open Retrieval QA

SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021) 1793-1797

DOI: 10.1145/3404835.3463085

7Citations

18Readers

Get full text

Abstract

Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) - -a state-of-the-art (SOTA) open domain neural retrieval model - -on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Gangi Reddy, R., Iyer, B., Sultan, M. A., Zhang, R., Sil, A., Castelli, V., … Roukos, S. (2021). Synthetic Target Domain Supervision for Open Retrieval QA. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1793–1797). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463085

Synthetic Target Domain Supervision for Open Retrieval QA

Abstract

Author supplied keywords

Cite

Register to see more suggestions