Aims: Several risk factors have been identified to predict worse outcomes in patients affected by SARS-CoV-2 infection. Prediction models are needed to optimize clinical management and to early stratify patients at a higher mortality risk. Machine learning (ML) algorithms represent a novel approach to identify a prediction model with a good discriminatory capacity to be easily used in clinical practice. Methods and results: The Cardio-COVID is a multicentre observational study that involved a cohort of consecutive adult Caucasian patients with laboratory-confirmed COVID-19 [by real time reverse transcriptase-polymerase chain reaction (RT-PCR)] who were hospitalized in 13 Italian cardiology units from 1 March to 9 April 2020. Patients were followed-up after the COVID-19 diagnosis and all causes in-hospital mortality or discharge were ascertained until 23 April 2020. Variables with more than 20% of missing values were excluded. The Lasso procedure was used with a λ=0.07 for reducing the covariates number. Mortality was estimated by means of a Random Forest (RF). The dataset was randomly divided in two subsamples with the same percentage of death/alive people of the entire sample: training set contained 80% of the data and test set the remaining 20%. The training set was used in the calibration procedure where a RF models in-hospital mortality with the covariates selected by Lasso. Its accuracy was measured by means of the ROC curve, obtaining AUC, sensitivity, specificity, and related 95% confidence interval (CI) computed with 10 000 stratified bootstrap replicates. From the RF the relative Variable Importance Measure (relVIM) was extracted to understand which of the selected variables had the greatest impact on outcome, providing a ranking from the most (relVIM=100) to the less important variable. The model obtained was compared with the Gradient Boosting Machine (GBM) and with the logistic regression, where the predictions were cross validated. Finally, to understand if each model has the same performance in sample (training) and out of sample (test), the two AUCs were compared by means of the DeLong's test. Among 701 patients enrolled (mean age 67.2±13.2 years, 69.5% males), 165 (23.5%) died during a median hospitalization of 15 (IQR, 9-24) days. Variables selected by the Lasso were: age, Oxygen saturation, PaO2/FiO2, Creatinine Clearance and elevated Troponin. Compared with those who survived, deceased patients were older, had a lower blood oxygenation, a lower creatinine clearance levels and higher prevalence of elevated Troponin (all P<0.001). Training set included 561 patients and test set 140 patients. The best performance out of sample was provided by the RF with an AUC of 0.78 (95% CI: 0.68-0.88) and a sensitivity of 0.88 (95% CI: 0.58-1.00). Moreover, RF is the unique methodology that provided similar performance in sample and out of sample (DeLong test P=0.78). On the contrary, prediction model was less accurate by using GBM and logistic regression. The relVIM ranked the variables from the most to the less important in predicting the outcome as follows: clearance creatinine, PaO2/FiO2, age, oxygen saturation, and elevated Troponin. Conclusions: In a large COVID-19 population, we showed that a customizable MLbased score derived from clinical variables, is feasible and effective for the prediction of in-hospital mortality.
CITATION STYLE
Paris, S., Inciardi, R. M., Specchia, C., Vezzoli, M., Oriecuia, C., Lombardi, C. M., … Metra, M. (2021). 554 Machine learning for prediction of in-hospital mortality in COVID-19 patients: results from an Italian multicentre study. European Heart Journal Supplements, 23(Supplement_G). https://doi.org/10.1093/eurheartj/suab135.035
Mendeley helps you to discover research relevant for your work.