Abstract
Artificial intelligence has demonstrated a promising potential for diagnosing complex medical cases, with Generative Pre-Trained Transformer 4 (GPT-4) being the most recent advancement in this field. This study evaluated the diagnostic performance of the GPT-4 in comparison with that of its predecessor, GPT-3.5, using 81 complex medical case records from the New England Journal of Medicine. The cases were categorized as cognitive impairment, infectious disease, rheumatology, or drug reactions. The GPT-4 achieved a primary diagnostic accuracy of 38.3%, which improved to 71.6% when differential diagnoses were included. In 84.0% of cases, primary diagnoses were made by conducting investigations suggested by GPT-4. GPT-4 outperformed GPT-3.5 in all subspecialties except for drug reactions. GPT-4 demonstrated the highest performance in infectious diseases and drug reactions, whereas it underperformed in cases of cognitive impairment. These findings indicate that GPT-4 can provide reasonably accurate diagnoses, comprehensive differential diagnoses, and appropriate investigations. However, its performance varies across subspecialties.
Author supplied keywords
Cite
CITATION STYLE
Luk, D. W. A., Ip, W. C. T., & Shea, Y. F. (2024). Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties. Journal of the Chinese Medical Association, 87(3), 259–260. https://doi.org/10.1097/JCMA.0000000000001064
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.