Abstract
Effectively managing evidence-based information is increasingly challenging. This study tested large language models (LLMs), including document- and online-enabled retrieval-augmented generation (RAG) systems, using 13 recent neurology guidelines across 130 questions. Results showed substantial variability. RAG improved accuracy compared to base models but still produced potentially harmful answers. RAG-based systems performed worse on case-based than knowledge-based questions. Further refinement and improved regulation is needed for safe clinical integration of RAG-enhanced LLMs.
Cite
CITATION STYLE
Masanneck, L., Meuth, S. G., & Pawlitzki, M. (2025). Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology. Npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01536-y
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.