Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

8Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Taiwan is well-known for its quality healthcare system. The country's medical licensing exams offer a way to evaluate ChatGPT's medical proficiency. Methods: We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a “chain of thought” approach. Accuracy rates were calculated as percentages. Results: ChatGPT-4's accuracy in medical exams ranged from 63.75% to 93.75% (February 2022–July 2023). The highest accuracy (93.75%) was in February 2022's Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using “chain of thought,” the “Accuracy of (CoT) prompting” ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%. Conclusion: ChatGPT-4 succeeded in Taiwan's medical licensing exams. With the “chain of thought” prompt, it improved accuracy to over 90%.

Cite

CITATION STYLE

APA

Lin, S. Y., Chan, P. K., Hsu, W. H., & Kao, C. H. (2024). Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination. Digital Health, 10. https://doi.org/10.1177/20552076241237678

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free