Fine-Tuning BERT for COVID-19 Domain Ad-Hoc IR by Using Pseudo-qrels

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This work analyzes the feasibility of training a neural retrieval system for a collection of scientific papers about COVID-19 using pseudo-qrels extracted from the collection. We propose a method for generating pseudo-qrels that exploits two characteristics present in scientific articles: a) the relationship between title and abstract, and b) the relationship between articles through sentences containing citations. Through these signals we generate pseudo-queries and their respective pseudo-positive (relevant documents) and pseudo-negative (non-relevant documents) examples. The article retrieval process combines a ranking model based on term-maching techniques and a neural one based on pretrained BERT models. BERT models are fine-tuned to the task using the pseudo-qrels generated. We compare different BERT models, both open domain and biomedical domain, and also the generated pseudo-qrels with the open domain MS-Marco dataset for fine-tuning the models. The results obtained on the TREC-COVID collection show that pseudo-qrels provide a significant improvement to neural models, both against classic IR baselines based on term-matching and neural systems trained on MS-Marco.

Cite

CITATION STYLE

APA

Saralegi, X., & San Vicente, I. (2021). Fine-Tuning BERT for COVID-19 Domain Ad-Hoc IR by Using Pseudo-qrels. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12657 LNCS, pp. 376–383). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-72240-1_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free