BioMedBERT: A Pre-trained Biomedical Language Model for QA and IR

49Citations
Citations of this article
108Readers
Mendeley users who have this article in their library.

Abstract

The SARS-CoV-2 (COVID-19) pandemic spotlighted the importance of moving quickly with biomedical research. However, as the number of biomedical research papers continue to increase, the task of finding relevant articles to answer pressing questions has become significant. In this work, we propose a textual data mining tool that supports literature search to accelerate the work of researchers in the biomedical domain. We achieve this by building BioMedBERT, a neural-based deep contextual understanding model for Question-Answering (QA) and Information Retrieval tasks. We also leverage the new BREATHE dataset which is one of the largest available datasets of biomedical research literature, containing abstracts and full-text articles from ten different biomedical literature sources on which we pre-train our BioMedBERT model. Our work achieves state-of-the-art results on the QA fine-tuning task on BioASQ 5b, 6b and 7b datasets. In addition, we observe superior relevant results when BioMedBERT embeddings are used with Elasticsearch for the Information Retrieval task on the intelligently formulated BioASQ dataset. We believe our diverse dataset and our unique model architecture are what led us to achieve the state-of-the-art results for QA and IR tasks.

Cite

CITATION STYLE

APA

Chakraborty, S., Bisong, E., Bhatt, S., Wagner, T. O., Mosconi, F., & Elliott, R. D. (2020). BioMedBERT: A Pre-trained Biomedical Language Model for QA and IR. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 669–679). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.59

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free