ChatGPT Performance on 120 Interdisciplinary Allergology Questions—Systematic Evaluation With Clinical Error Impact Assessment for Critical Erroneous AI-Guided Chatbot Advice

4Citations
Citations of this article
52Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: ChatGPT (Chatbot with Generative Pretrained Transformer), despite not being a medical device, may be used by patients for medical inquiries. Its accessibility and convenience, particularly amidst long waiting times for allergology appointments, make it an attractive but potentially erroneous source of advice. Objectives: This study evaluates ChatGPT's performance on allergological questions from clinical practice, offering a systematic approach to rating its errors. An Allergological Error Impact Assessment is proposed to analyze the potential consequences of these errors on patients. Methods: A total of 120 multidisciplinary allergology questions from dermatology, pediatrics, and pulmonology were prompted to ChatGPT (3.5). Errors were assessed in terms of content, accuracy (ACC), completeness (CO), perceived humanness (PHU), and readability (Flesch Reading Ease). Erroneous responses were categorized on a 3-step severity scale (minor, major, and critical). Critical errors underwent allergological error impact analysis. Statistical evaluation included descriptive analyses and Kruskal-Wallis and Mann-Whitney U tests. Results: ChatGPT demonstrated good accuracy (mean ACC 4.1/5, standard deviation: 0.78, range: 1-5). CO and PHU were sufficient but lowest for pediatric queries. Readability was at an academic level for most responses. Six critical errors were identified: 1 in dermatology, 2 in pediatrics, and 3 in pulmonology. Notably, a critical pediatric food allergen error carried a potentially life-threatening risk. Conclusion: ChatGPT's imperfect reliability in allergology highlights the need for expert counseling in specialized fields. Tailoring these tools to allergy use cases could improve utility of models like ChatGPT for clinical applications, such as answering questions from allergological routine care.

Cite

CITATION STYLE

APA

Mathes, S., Seurig, S., Bluhme, F., Beyer, K., Heizmann, F., Wagner, M., … Darsow, U. (2025). ChatGPT Performance on 120 Interdisciplinary Allergology Questions—Systematic Evaluation With Clinical Error Impact Assessment for Critical Erroneous AI-Guided Chatbot Advice. Journal of Allergy and Clinical Immunology: In Practice, 13(6), 1350-1357.e4. https://doi.org/10.1016/j.jaip.2025.03.030

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free