Improving cardiovascular risk prediction through machine learning modelling of irregularly repeated electronic health records

18Citations
Citations of this article
53Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Aims Existing electronic health records (EHRs) often consist of abundant but irregular longitudinal measurements of risk factors. In this study, we aim to leverage such data to improve the risk prediction of atherosclerotic cardiovascular disease (ASCVD) by applying machine learning (ML) algorithms, which can allow automatic screening of the population. Methods A total of 215 744 Chinese adults aged between 40 and 79 without a history of cardiovascular disease were included (6081 and results cases) from an EHR-based longitudinal cohort study. To allow interpretability of the model, the predictors of demographic characteristics, medication treatment, and repeatedly measured records of lipids, glycaemia, obesity, blood pressure, and renal function were used. The primary outcome was ASCVD, defined as non-fatal acute myocardial infarction, coronary heart disease death, or fatal and non-fatal stroke. The eXtreme Gradient boosting (XGBoost) algorithm and Least Absolute Shrinkage and Selection Operator (LASSO) regression models were derived to predict the 5-year ASCVD risk. In the validation set, compared with the refitted Chinese guideline–recommended Cox model (i.e. the China-PAR), the XGBoost model had a significantly higher C-statistic of 0.792, (the differences in the C-statistics: 0.011, 0.006–0.017, P < 0.001), with similar results reported for LASSO regression (the differences in the C-statistics: 0.008, 0.005–0.011, P < 0.001). The XGBoost model demonstrated the best calibration performance (men: Dx = 0.598, P = 0.75; women: Dx = 1.867, P = 0.08). Moreover, the risk distribution of the ML algorithms differed from that of the conventional model. The net reclassification improvement rates of XGBoost and LASSO over the Cox model were 3.9% (1.4–6.4%) and 2.8% (0.7–4.9%), respectively. Conclusion Machine learning algorithms with irregular, repeated real-world data could improve cardiovascular risk prediction. They demonstrated significantly better performance for reclassification to identify the high-risk population correctly. Lay summary The usual cardiovascular risk assessment tools use single measurement of limited traditional risk factors. Existing electronic health records (EHRs) often have abundant longitudinal measurements and a wider range of predictors available. These could not only facilitate improvement in prediction accuracy but also allow automatic screening when the tool is embedded within the EHR system. Machine learning (ML) approaches are known to accommodate irregular measurement records. This study, therefore, compares the performance of two ML models with the guideline-recommended model under real-world scenarios, indicating that: • Incorporating irregular multiple predictors with repeated measurements into simple ML algorithms is feasible and interpretable. • The accuracy of the risk prediction can be significantly improved, especially with regard to risk reclassification. According to the risk cut-offs recommended by the current guideline, the ML models can allocate the participants into different risk groups more correctly than the guideline-recommended model.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, C., Liu, X., Shen, P., Sun, Y., Zhou, T., Chen, W., … Gao, P. (2024). Improving cardiovascular risk prediction through machine learning modelling of irregularly repeated electronic health records. European Heart Journal - Digital Health, 5(1), 30–40. https://doi.org/10.1093/ehjdh/ztad058

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free