Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment

Victoria Yaneva; Peter Baldwin; Daniel P. Jurich; Kimberly Swygert; Brian E. Clauser

Journal ArticleOPEN ACCESS

Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment

Academic Medicine (2024) 99(2) 192-197

DOI: 10.1097/ACM.0000000000005549

18Citations

41Readers

Get full text

Abstract

Purpose In late 2022 and early 2023, reports that ChatGPT could pass the United States Medical Licensing Examination (USMLE) generated considerable excitement, and media response suggested ChatGPT has credible medical knowledge. This report analyzes the extent to which an artificial intelligence (AI) agent's performance on these sample items can generalize to performance on an actual USMLE examination and an illustration is given using ChatGPT. Method As with earlier investigations, analyses were based on publicly available USMLE sample items. Each item was submitted to ChatGPT (version 3.5) 3 times to evaluate stability. Responses were scored following rules that match operational practice, and a preliminary analysis explored the characteristics of items that ChatGPT answered correctly. The study was conducted between February and March 2023. Results For the full sample of items, ChatGPT scored above 60% correct except for one replication for Step 3. Response success varied across replications for 76 items (20%). There was a modest correspondence with item difficulty wherein ChatGPT was more likely to respond correctly to items found easier by examinees. ChatGPT performed significantly worse (P

Cite

CITATION STYLE

APA

Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K., & Clauser, B. E. (2024). Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment. Academic Medicine, 99(2), 192–197. https://doi.org/10.1097/ACM.0000000000005549

Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment

Abstract

Cite

Register to see more suggestions