Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment Emerging Models for Secondary Data Analysis (SDA)

Chih-Lin Chi; Jin Wang; Thomas R Clancy; Jennifer G Robinson; Peter J Tonellato; Terrence J Adam

Journal Article

Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment Emerging Models for Secondary Data Analysis (SDA)

Chi C
Wang J
Clancy T
et al.

Western Journal of Nursing Research (2017) 39(1) 42-62

ISSN: 0193-9459

N/ACitations

4Readers

Abstract

Health care Big Data studies hold substantial promise for improving clinical practice. Among analytic tools, machine learning (ML) is an important approach that has been widely used by many industries for data-driven decision support. In Big Data, thousands of variables and millions of patient records are commonly encountered, but most data elements cannot be directly used to support decision making. Although many feature-selection tools can help identify relevant data, these tools are typically insufficient to determine a patient data cohort to support learning. Therefore, domain experts with nursing or clinic knowledge play critical roles in determining value criteria or the type of variables that should be included in the patient cohort to maximize project success. We demonstrate this process by extracting a patient cohort (37,506 individuals) to support our ML work (i.e., the production of a proactive strategy to prevent statin adverse events) from 130 million de-identified lives in the OptumLabs™ Data Warehouse. The interest in Big Data analyses and the speed of data growth is rapidly increasing. Among the available Big Data tools, machine learning (ML) tech-niques are typically used to empirically observe/capture patterns from data, which can then be used for individualized prediction or other types of data-driven decision support. To be successful with this work, the key initial chal-lenge is the selection of an appropriate data cohort to support ML. This article describes the first phase of a larger study and demonstrates how we identify patients for our cohort from the OptumLabs™ Data Warehouse (OLDW) for ML project work.

Author supplied keywords

Cite

CITATION STYLE

APA

Chi, C.-L., Wang, J., Clancy, T. R., Robinson, J. G., Tonellato, P. J., & Adam, T. J. (2017). Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment Emerging Models for Secondary Data Analysis (SDA). Western Journal of Nursing Research, 39(1), 42–62. Retrieved from http://journals.sagepub.com/doi/pdf/10.1177/0193945916673059

Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment Emerging Models for Secondary Data Analysis (SDA)

Abstract

Author supplied keywords

Cite

Register to see more suggestions