Finding relevant biomedical datasets: The UC San Diego solution for the bioCADDIE Retrieval Challenge

8Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users' free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval.

Cite

CITATION STYLE

APA

Wei, W., Ji, Z., He, Y., Zhang, K., Ha, Y., Li, Q., & Ohno-Machado, L. (2018). Finding relevant biomedical datasets: The UC San Diego solution for the bioCADDIE Retrieval Challenge. Database, 2018(2018). https://doi.org/10.1093/database/bay017

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free