Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

Adi Lahat; Kassem Sharif; Narmin Zoabi; Yonatan Shneor Patt; Yousra Sharif; Lior Fisher; Uria Shani; Mohamad Arow; Roni Levin; Eyal Klang

Journal ArticleOPEN ACCESS

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

Journal of Medical Internet Research (2024) 26(1)

DOI: 10.2196/54571

42Citations

98Readers

Abstract

Background: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. Objective: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors’ and residents’ ratings, and specific question types. Methods: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. Results: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P

Cite

CITATION STYLE

APA

Lahat, A., Sharif, K., Zoabi, N., Patt, Y. S., Sharif, Y., Fisher, L., … Klang, E. (2024). Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. Journal of Medical Internet Research, 26(1). https://doi.org/10.2196/54571

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

Abstract

Author supplied keywords

Cite

Register to see more suggestions