APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support

Jethro C.C. Kwong; Adree Khondker; Katherine Lajkosz; Matthew B.A. Mcdermott; Xavier Borrat Frigola; Melissa D. Mccradden; Muhammad Mamdani; Girish S. Kulkarni; Alistair E.W. Johnson

Journal ArticleOPEN ACCESS

APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support

JAMA Network Open (2023) 6(9)

DOI: 10.1001/jamanetworkopen.2023.35377

74Citations

92Readers

Abstract

Importance: Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question. Objective: To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support. Design, Setting, and Participants: This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022. Main Outcomes and Measures: The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. Results: A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P

Cite

CITATION STYLE

APA

Kwong, J. C. C., Khondker, A., Lajkosz, K., Mcdermott, M. B. A., Frigola, X. B., Mccradden, M. D., … Johnson, A. E. W. (2023). APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support. JAMA Network Open, 6(9). https://doi.org/10.1001/jamanetworkopen.2023.35377

APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support

Abstract

Cite

Register to see more suggestions