Background: Taiwan is well-known for its quality healthcare system. The country's medical licensing exams offer a way to evaluate ChatGPT's medical proficiency. Methods: We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a “chain of thought” approach. Accuracy rates were calculated as percentages. Results: ChatGPT-4's accuracy in medical exams ranged from 63.75% to 93.75% (February 2022–July 2023). The highest accuracy (93.75%) was in February 2022's Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using “chain of thought,” the “Accuracy of (CoT) prompting” ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%. Conclusion: ChatGPT-4 succeeded in Taiwan's medical licensing exams. With the “chain of thought” prompt, it improved accuracy to over 90%.
CITATION STYLE
Lin, S. Y., Chan, P. K., Hsu, W. H., & Kao, C. H. (2024). Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination. Digital Health, 10. https://doi.org/10.1177/20552076241237678
Mendeley helps you to discover research relevant for your work.