Large language model performance in ophthalmic patient education: Amblyopia and age-related macular degeneration

Poster Presentation 23.476: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Action: Miscellaneous

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

Sunwoo Kwon^1,2,3, Artashes Yeritsyan², Dennis Levi^2,3,4; ¹Exponent, ²Herbert Wertheim School of Optometry and Vision Science, ³Center for Innovation in Vision and Optics, ⁴Helen Wills Neuroscience Institute

Artificial intelligence (AI) chatbots are increasingly used for patient education, with 22% of Americans reporting seeking health advice from these tools. However, the reliability of chatbot-generated ophthalmic information remains unclear. We conducted a multi-condition evaluation of chatbot responses to patient-focused questions across two prevalent vision and visual-system disorders: amblyopia and age-related macular degeneration (AMD). We assessed the accuracy, comprehensiveness, and readability of these chatbot responses in comparison to the American Academy of Ophthalmology (AAO) and the American Optometric Association (AOA) patient materials. Two condition-specific question sets were constructed from AAO/AOA patient brochures: 21 amblyopia questions and 12 AMD questions. Each question was entered twice into six publicly available AI chatbots (ChatGPT-3.5, ChatGPT-4, Gemini, Meta AI, Snap AI, and Copilot), generating 252 amblyopia and 144 AMD chatbot responses. Using a 5-point Likert scale, amblyopia responses were rated by three optometrists with expertise in amblyopia while AMD responses were rated by five optometrists with expertise in retinal disease. Accuracy and comprehensiveness were analyzed using the Friedman and post hoc Wilcoxon signed-rank tests, while readability was analyzed using ANOVA and post hoc TukeyHSD tests. Due to differences in question sets and raters, datasets were analyzed independently and synthesized qualitatively. Across both ocular conditions, GPT-3.5, GPT-4, Copilot, and Gemini consistently produced more accurate and comprehensive responses than Meta AI, Snap AI, and AAO/AOA brochures. With the exception of Copilot, all AI chatbots produced significantly harder-to-read texts than AAO/AOA patient brochures for both ocular conditions. Our results demonstrate that across two ophthalmic conditions, AI chatbots, particularly GPT-3.5, GPT-4, Copilot, and Gemini, outperformed AAO/AOA materials in accuracy and comprehensiveness but displayed persistent readability challenges. Collectively, these findings delineate both the emerging capabilities and current limitations of AI systems in ophthalmic patient education.

Vision Sciences Society

Large language model performance in ophthalmic patient education: Amblyopia and age-related macular degeneration

Important Dates

MyVSS

Join VSS

Future Meetings