Evaluating ChatGPT-4 in the development of family medicine residency examinations

Hanu Chaudhari; Christopher Meaney; Kulamakan Kulasegaram; Fok Han Leung

Journal ArticleOPEN ACCESS

Evaluating ChatGPT-4 in the development of family medicine residency examinations

PLOS Digital Health (2025) 4(12 December)

DOI: 10.1371/journal.pdig.0001156

0Citations

8Readers

Get full text

Abstract

Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.

Cite

CITATION STYLE

APA

Chaudhari, H., Meaney, C., Kulasegaram, K., & Leung, F. H. (2025). Evaluating ChatGPT-4 in the development of family medicine residency examinations. PLOS Digital Health, 4(12 December). https://doi.org/10.1371/journal.pdig.0001156

Evaluating ChatGPT-4 in the development of family medicine residency examinations

Abstract

Cite

Register to see more suggestions