Patient cohort discovery across heterogeneous data sources is a challenging task, which may involve a complicated process of data loading, harmonization, and querying. Most existing cohort identification tools use a relational database model implemented in SQL for storing patient data. However, SQL databases have restrictions on the maximum number of columns in a table, which necessitates the breaking down of high-dimensional data into multiple tables and affects query performance as a result. In this paper, we proposed two NoSQL-based patient cohort query systems based on an existing SQL-based system for a cross-cohort query interface for the National Sleep Resource Research (NSRR). We used eight NSRR datasets in our experiment to evaluate the performance of NoSQL-based and SQL-based systems in data loading, harmonization, and query. Our experiment showed that NoSQL-based approaches outperformed the SQL-based, and NoSQL-based systems are rather promising for developing patient cohort query systems across heterogeneous data sources.
CITATION STYLE
Zeng, N., Zhang, G. Q., Li, X., & Cui, L. (2017). Evaluation of Relational and NoSQL Approaches for Cohort Identification from Heterogeneous Data Sources in the National Sleep Research Resource. Journal of Health & Medical Informatics, 08(05). https://doi.org/10.4172/2157-7420.1000295
Mendeley helps you to discover research relevant for your work.