Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study

  • Chen Y
  • Dong M
  • Sun J
  • et al.
2Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Background Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives. Objective To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories. Methods This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient. Results Among 999 patients (median age 66 y, range 58‐74; 650 males), CAD-RADS categorization showed accuracy of 0.98‐1.00 (95% CI 0.9730‐1.0000), sensitivity of 0.95‐1.00 (95% CI 0.9191‐1.0000), specificity of 0.98‐1.00 (95% CI 0.9669‐1.0000), and F1-score of 0.96‐1.00 (95% CI 0.9253‐1.0000). P categories demonstrated accuracy of 0.97‐1.00 (95% CI 0.9569‐0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085‐1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533‐1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377‐0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680‐0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680‐0.9870). Cohen κ values for all classifications exceeded 0.98. Conclusions The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.

Cite

CITATION STYLE

APA

Chen, Y., Dong, M., Sun, J., Meng, Z., Yang, Y., Muhetaier, A., … Qin, J. (2025). Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study. JMIR Medical Informatics, 13, e70967–e70967. https://doi.org/10.2196/70967

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free