Abstract
Background and Aims: ChatGPT is a popular large language model with potential educational applications in medicine. However, its performance in standardized, multi-disciplinary medical exams has not been comprehensively assessed. This study evaluates ChatGPT's accuracy and quality in Iran's national medical pre-internship exam. Methods: We tested ChatGPT (GPT-3.5, May 3rd version) on 195 multiple-choice questions from the March 2022 Iranian pre-internship exam, covering 23 medical specialties. Questions with visual content were excluded. Each question was asked in a new chat to avoid memory bias. Responses were evaluated by 55 experts using a 5-point Likert scale and compared against the official answer key. Data were analyzed descriptively using SPSS. Results: ChatGPT answered 68.6% of questions correctly. Expert ratings averaged 4.23/5 (SD = 1.21), indicating good to excellent quality. Best-performing specialties included pharmacology (85.7%), otorhinolaryngology (83.3%), and dermatology (83.3%). Lower performance was observed in pulmonology (42.9%) and epidemiology (50%). Conclusion: ChatGPT shows promise as a supplemental educational tool in medical education, but its accuracy varies by specialty. Faculty guidance is essential to ensure responsible integration until further improvements and validations are made.
Author supplied keywords
Cite
CITATION STYLE
Motaghi Niko, M., Karbasi, Z., Kazemi, M., & Zahmatkeshan, M. (2026). Examining the Performance of ChatGPT in Comprehensive Pre-Internship Exam: The Potential of Artificial Intelligence in Medical Education. Health Science Reports, 9(1). https://doi.org/10.1002/hsr2.71492
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.