Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study

Hiromizu Takahashi; Kiyoshi Shikino; Takeshi Kondo; Akira Komori; Yuji Yamada; Mizue Saita; Toshio Naito

Journal ArticleOPEN ACCESS

Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study

JMIR Medical Education (2024) 10

DOI: 10.2196/59133

0Citations

28Readers

Get full text

Abstract

Background: Evaluating the accuracy and educational utility of artificial intelligence–generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored. Objective: This study aimed to assess the educational utility of ChatGPT-4–generated clinical vignettes and their applicability in educational settings. Methods: Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians’ experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases. Results: Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P

Author supplied keywords

Cite

CITATION STYLE

APA

Takahashi, H., Shikino, K., Kondo, T., Komori, A., Yamada, Y., Saita, M., & Naito, T. (2024). Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR Medical Education, 10. https://doi.org/10.2196/59133

Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study

Abstract

Author supplied keywords

Cite

Register to see more suggestions