Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Naritsaret Kaewboonlert; Jiraphon Poontananggul; Natthipong Pongsuwan; Gun Bhakdisongkhram

Journal ArticleOPEN ACCESS

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

JMIR Medical Education (2025) 11

DOI: 10.2196/58898

6Citations

74Readers

Get full text

Abstract

Background: Artificial intelligence (AI) has become widely applied across many fields, including medical education. Content validation and its answers are based on training datasets and the optimization of each model. The accuracy of large language model (LLMs) in basic medical examinations and factors related to their accuracy have also been explored. Objective: We evaluated factors associated with the accuracy of LLMs (GPT-3.5, GPT-4, Google Bard, and Microsoft Bing) in answering multiple-choice questions from basic medical science examinations. Methods: We used questions that were closely aligned with the content and topic distribution of Thailand’s Step 1 National Medical Licensing Examination. Variables such as the difficulty index, discrimination index, and question characteristics were collected. These questions were then simultaneously input into ChatGPT (with GPT-3.5 and GPT-4), Microsoft Bing, and Google Bard, and their responses were recorded. The accuracy of these LLMs and the associated factors were analyzed using multivariable logistic regression. This analysis aimed to assess the effect of various factors on model accuracy, with results reported as odds ratios (ORs). Results: The study revealed that GPT-4 was the top-performing model, with an overall accuracy of 89.07% (95% CI 84.76%‐92.41%), significantly outperforming the others (P

Author supplied keywords

Cite

CITATION STYLE

APA

Kaewboonlert, N., Poontananggul, J., Pongsuwan, N., & Bhakdisongkhram, G. (2025). Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study. JMIR Medical Education, 11. https://doi.org/10.2196/58898

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Abstract

Author supplied keywords

Cite

Register to see more suggestions