Abstract
The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users' free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval.
Cite
CITATION STYLE
Wei, W., Ji, Z., He, Y., Zhang, K., Ha, Y., Li, Q., & Ohno-Machado, L. (2018). Finding relevant biomedical datasets: The UC San Diego solution for the bioCADDIE Retrieval Challenge. Database, 2018(2018). https://doi.org/10.1093/database/bay017
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.