Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach

Jonathan Thomas; Linda M Liao; Rashmi Sinha; Tushar Patel; Samuel O Antwi

Journal ArticleOPEN ACCESS

Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach

Thomas J
Liao L
Sinha R
et al.

Journal of Hepatocellular Carcinoma (2022) Volume 9 69-81

DOI: 10.2147/jhc.s341045

N/ACitations

17Readers

Get full text

Abstract

Background: Prediction of hepatocellular carcinoma (HCC) development in persons with known risk factors remain a challenge and is an urgent unmet need, considering projected increases in HCC incidence and mortality in the US. We aimed to use machine learning techniques to identify a set of demographic, lifestyle, and health history information that can be used simultaneously for population-level HCC risk prediction. Methods: Data from 377,065 participants of the NIH-AARP Diet and Health Study, among whom 647 developed HCC over 16 years of follow-up, were analyzed. The sample was randomly divided into independent training (60%) and validation (40%) sets. We evaluated 123 participant characteristics and tested 15 different machine learning algorithms for robustness in predicting HCC risk. Separately, we evaluated variables selected from multivariable logistic regression for risk prediction. Results: The random under-sampling boosting (RUSBoost) algorithm performed best during model testing. Fourteen participant characteristics were selected for risk prediction based on differences between cases and controls (Bonferroni-corrected p-values <0.0004) and from the most frequently used variables in the initial two decision trees of the RUSBoost learner trees. A predictive model based on the 14 variables had an AUC of 0.72 (sensitivity=0.68, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.68, specificity=0.63). A subset of 9 variables identified through logistic regression also had an AUC of 0.72 (sensitiv-ity=0.67, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.70, specificity=0.61). Conclusion: Population-level HCC risk prediction can be performed with a machine learning-based algorithm and could inform strategies for improving HCC risk reduction in at-risk groups.

Cite

CITATION STYLE

APA

Thomas, J., Liao, L. M., Sinha, R., Patel, T., & Antwi, S. O. (2022). Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach. Journal of Hepatocellular Carcinoma, Volume 9, 69–81. https://doi.org/10.2147/jhc.s341045

Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach

Abstract

Cite

Register to see more suggestions