Assessment of a Large Language Model's Responses to Questions and Cases about Glaucoma and Retina Management

25Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Importance: Large language models (LLMs) are revolutionizing medical diagnosis and treatment, offering unprecedented accuracy and ease surpassing conventional search engines. Their integration into medical assistance programs will become pivotal for ophthalmologists as an adjunct for practicing evidence-based medicine. Therefore, the diagnostic and treatment accuracy of LLM-generated responses compared with fellowship-trained ophthalmologists can help assess their accuracy and validate their potential utility in ophthalmic subspecialties. Objective: To compare the diagnostic accuracy and comprehensiveness of responses from an LLM chatbot with those of fellowship-trained glaucoma and retina specialists on ophthalmological questions and real patient case management. Design, Setting, and Participants: This comparative cross-sectional study recruited 15 participants aged 31 to 67 years, including 12 attending physicians and 3 senior trainees, from eye clinics affiliated with the Department of Ophthalmology at Icahn School of Medicine at Mount Sinai, New York, New York. Glaucoma and retina questions (10 of each type) were randomly selected from the American Academy of Ophthalmology's commonly asked questions Ask an Ophthalmologist. Deidentified glaucoma and retinal cases (10 of each type) were randomly selected from ophthalmology patients seen at Icahn School of Medicine at Mount Sinai-affiliated clinics. The LLM used was GPT-4 (version dated May 12, 2023). Data were collected from June to August 2023. Main Outcomes and Measures: Responses were assessed via a Likert scale for medical accuracy and completeness. Statistical analysis involved the Mann-Whitney U test and the Kruskal-Wallis test, followed by pairwise comparison. Results: The combined question-case mean rank for accuracy was 506.2 for the LLM chatbot and 403.4 for glaucoma specialists (n = 831; Mann-Whitney U = 27976.5; P

Cite

CITATION STYLE

APA

Huang, A. S., Hirabayashi, K., Barna, L., Parikh, D., & Pasquale, L. R. (2024). Assessment of a Large Language Model’s Responses to Questions and Cases about Glaucoma and Retina Management. JAMA Ophthalmology, 142(4), 371–375. https://doi.org/10.1001/jamaophthalmol.2023.6917

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free