Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Jérôme R. Lechien; Mattheuw R. Naunheim; Antonino Maniaci; Thomas Radulesco; Alberto M. Saibene; Carlos M. Chiesa-Estomba; Luigi A. Vaira

Journal ArticleOPEN ACCESS

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Otolaryngology - Head and Neck Surgery (United States) (2024) 170(6) 1519-1526

DOI: 10.1002/ohn.759

39Citations

61Readers

Get full text

Abstract

Objective: To study the performance of Chatbot Generative Pretrained Transformer-4 (ChatGPT-4) in the management of cases in otolaryngology–head and neck surgery. Study Design: Prospective case series. Setting: Multicenter University Hospitals. Methods: History, clinical, physical, and additional examinations of adult outpatients consulting in otolaryngology departments of CHU Saint-Pierre and Dour Medical Center were presented to ChatGPT-4, which was interrogated for differential diagnoses, management, and treatment(s). According to specialty, the ChatGPT-4 responses were assessed by 2 distinct, blinded board-certified otolaryngologists with the Artificial Intelligence Performance Instrument. Results: One hundred cases were presented to ChatGPT-4. ChaGPT-4 indicated a mean of 3.34 (95% confidence interval [CI]: 3.09, 3.59) additional examinations per patient versus 2.10 (95% CI: 1.76, 2.34; P =.001) for the practitioners. There was strong consistency (k > 0.600) between otolaryngologists and ChatGPT-4 for the indication of upper aerodigestive tract endoscopy, positron emission tomography and computed tomography, audiometry, tympanometry, and psychophysical evaluations. Primary diagnosis was correctly performed by ChatGPT-4 in 38% to 86% of cases depending on subspecialty. Additional examinations indicated by ChatGPT-4 were pertinent and necessary in 8% to 31% of cases, while the treatment regimen was pertinent in 12% to 44% of cases. The performance of ChatGPT-4 was not influenced by the human-reported level of difficulty of clinical cases. Conclusion: ChatGPT-4 may be a promising adjunctive tool in otolaryngology, providing extensive documentation about additional examinations, primary and differential diagnoses, and treatments. The ChatGPT-4 is more effective in providing a primary diagnosis, and less effective in the selection of additional examinations and treatments.

Author supplied keywords

Cite

CITATION STYLE

APA

Lechien, J. R., Naunheim, M. R., Maniaci, A., Radulesco, T., Saibene, A. M., Chiesa-Estomba, C. M., & Vaira, L. A. (2024). Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series. Otolaryngology - Head and Neck Surgery (United States), 170(6), 1519–1526. https://doi.org/10.1002/ohn.759

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Abstract

Author supplied keywords

Cite

Register to see more suggestions