Metadata discovery of heterogeneous biomedical datasets using token-based features

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Metadata discovery is the process of recognizing semantics and descriptors of data elements and datasets. This study uses a machine-learning approach to classify biomedical dataset characteristics for metadata discovery. Four common types of biomedical data sources were included in this study - genetic variant, protein structure, scientific publications, and general English corpus. Decision tree classification models were built using token-based features derived from these data files. These decision tree classification models are able to identify the four data sources with average F1 scores ranging from 0.935 to 1.000. This study demonstrates that biomedical data of different types have different distributions of token-based document structural features and that such structural features can be leveraged for metadata discovery.

Cite

CITATION STYLE

APA

Wen, J., Gouripeddi, R., & Facelli, J. C. (2017). Metadata discovery of heterogeneous biomedical datasets using token-based features. In Lecture Notes in Electrical Engineering (Vol. 449, pp. 60–67). Springer Verlag. https://doi.org/10.1007/978-981-10-6451-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free