Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation

Mehmet Gün

Journal ArticleOPEN ACCESS

Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation

Gün M

BMC Emergency Medicine (2025) 25(1)

DOI: 10.1186/s12873-025-01303-y

2Citations

46Readers

Abstract

Background: Large language models (LLMs) such as ChatGPT are increasingly explored for clinical decision support. However, their performance in high-stakes emergency scenarios remains underexamined. This study aimed to evaluate ChatGPT’s diagnostic and therapeutic accuracy compared to a board-certified emergency physician across diverse emergency cases. Methods: This comparative study was conducted using 15 standardized emergency scenarios sourced from validated academic platforms (Geeky Medics, Life in the Fast Lane, Emergency Medicine Cases). ChatGPT (GPT-4) and a physician independently evaluated each case based on five predefined parameters: diagnosis, investigations, initial treatment, clinical safety, and decision-making complexity. Cases were scored out of 5. Concordance was categorized as high (5/5), moderate (4/5), or low (≤ 3/5). Wilson confidence intervals (95%) were calculated for each concordance category. Results: ChatGPT achieved high concordance (5/5) in 8 cases (53.3%, 95% CI: 27.6–77.0%), moderate concordance (4/5) in 4 cases (26.7%, CI: 10.3–55.4%), and low concordance (≤ 3/5) in 3 cases (20.0%, CI: 6.0–45.6%). Performance was strongest in structured, protocol-based conditions such as STEMI, DKA, and asthma. Lower performance was observed in complex scenarios like stroke, trauma with shock, and mixed acid-base disturbances. Conclusion: ChatGPT showed strong alignment with emergency physician decisions in structured scenarios but lacked reliability in complex cases. While AI may enhance decision-making and education, it cannot replace the clinical reasoning of human physicians. Its role is best framed as a supportive tool rather than a substitute.

Author supplied keywords

Cite

CITATION STYLE

APA

Gün, M. (2025). Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation. BMC Emergency Medicine, 25(1). https://doi.org/10.1186/s12873-025-01303-y

Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation

Abstract

Author supplied keywords

Cite

Register to see more suggestions