Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology

Lars Masanneck; Sven G. Meuth; Marc Pawlitzki

Journal ArticleOPEN ACCESS

Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology

npj Digital Medicine (2025) 8(1)

DOI: 10.1038/s41746-025-01536-y

15Citations

50Readers

Abstract

Effectively managing evidence-based information is increasingly challenging. This study tested large language models (LLMs), including document- and online-enabled retrieval-augmented generation (RAG) systems, using 13 recent neurology guidelines across 130 questions. Results showed substantial variability. RAG improved accuracy compared to base models but still produced potentially harmful answers. RAG-based systems performed worse on case-based than knowledge-based questions. Further refinement and improved regulation is needed for safe clinical integration of RAG-enhanced LLMs.

Cite

CITATION STYLE

APA

Masanneck, L., Meuth, S. G., & Pawlitzki, M. (2025). Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology. Npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01536-y

Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology

Abstract

Cite

Register to see more suggestions