Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study

Youmei Chen; Mengshi Dong; Jie Sun; Zhanao Meng; Yiqing Yang; Abudushalamu Muhetaier; Chao Li; Jie Qin

Journal Article

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study

Chen Y
Dong M
Sun J
et al.

JMIR Medical Informatics (2025) 13 e70967-e70967

DOI: 10.2196/70967

2Citations

19Readers

Get full text

Abstract

Background Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives. Objective To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories. Methods This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient. Results Among 999 patients (median age 66 y, range 58‐74; 650 males), CAD-RADS categorization showed accuracy of 0.98‐1.00 (95% CI 0.9730‐1.0000), sensitivity of 0.95‐1.00 (95% CI 0.9191‐1.0000), specificity of 0.98‐1.00 (95% CI 0.9669‐1.0000), and F1-score of 0.96‐1.00 (95% CI 0.9253‐1.0000). P categories demonstrated accuracy of 0.97‐1.00 (95% CI 0.9569‐0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085‐1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533‐1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377‐0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680‐0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680‐0.9870). Cohen κ values for all classifications exceeded 0.98. Conclusions The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.

Cite

CITATION STYLE

APA

Chen, Y., Dong, M., Sun, J., Meng, Z., Yang, Y., Muhetaier, A., … Qin, J. (2025). Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study. JMIR Medical Informatics, 13, e70967–e70967. https://doi.org/10.2196/70967

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study

Abstract

Cite

Register to see more suggestions