Abstract
Background: The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages. Objective: This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE). Methods: We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents’ correct response rates. Results: Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P
Author supplied keywords
- Asia
- Asian
- ChatGPT
- ChatGPT-4
- GM-ITE
- Japan
- Japanese
- LLM
- NLP
- answer
- answers
- artificial intelligence
- chatbot
- chatbots
- clinical
- clinical training
- conversational agent
- conversational agents
- exam
- examination
- examinations
- exams
- language model
- language models
- medical education
- natural language processing
- non-English language
- performance
- reasoning
- residency programs
- response
- responses
- self-assessment
Cite
CITATION STYLE
Watari, T., Takagi, S., Sakaguchi, K., Nishizaki, Y., Shimizu, T., Yamamoto, Y., & Tokuda, Y. (2023). Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR Medical Education, 9(1). https://doi.org/10.2196/52202
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.