We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Both models' accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) "Diagnosis This" questions. Additional analysis was based on image content, question difficulty, character length of models' responses, and model's alignment with responses from human respondents. χ2 test, Fisher's exact test, Student's t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0's answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0's answer (ρ=0.713, p<0.01). Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education. Copyright © 2023, Jiao et al.
CITATION STYLE
Jiao, C., Edupuganti, N. R., Patel, P. A., Bui, T., & Sheth, V. (2023). Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge. Cureus. https://doi.org/10.7759/cureus.45700
Mendeley helps you to discover research relevant for your work.