Retrieval-Augmented Generation vs. Baseline LLMs: A Multi-Metric Evaluation for Knowledge-Intensive Content

5Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

Abstract

(1) Background: The development of Generative Artificial Intelligence (GenAI) is transforming knowledge-intensive domains such as Education. However, Large Language Models (LLMs), which serve as the foundational components for GenAI tools, are trained on static datasets, often producing misleading, factually incorrect, or outdated responses. Our study explores the performance gains of Retrieval-Augmented LLMs over baseline LLMs while also identifying the trade-off opportunity between smaller-parameter LLMs augmented with user-specific data to larger parameter LLMs. (2) Methods: We experimented with four different LLMs, each with a different number of parameters, to generate outputs. These outputs were then evaluated across seven lexical and semantic metrics to identify performance trends in Retrieval-Augmented Generation (RAG)-Augmented LLMs and analyze the impact of parameter size on LLM performance. (3) Results and Discussions: We have synthesized 968 different combinations to identify this trend with the help of different LLM sizes/parameters: TinyLlama 1.1B, Mistral 7B, Llama 3.1 8B, and Llama 1 13 B. These studies were grouped into two themes: RAG-Augmented LLM percentage improvements to baseline LLMs and compelling trade-off possibilities of RAG-Augmented smaller-parameter LLMs to larger-parameter LLMs. Our experiments show that RAG-Augmented LLMs demonstrate high lexical and semantic scores relative to baseline LLMs. This offers RAG-Augmented LLMs as a compelling trade-off for reducing the number of parameters in LLMs and lowering overall resource demands. (4) Conclusions: The findings outline that by leveraging RAG-Augmented LLMs, smaller-parameter LLMs can perform better or equivalently to large-parameter LLMs, particularly demonstrating strong lexical improvements. They reduce the risks of hallucination and keep the output more contextualized, making them a better choice for knowledge-intensive content in academic and research sectors.

Cite

CITATION STYLE

APA

Vinayan Kozhipuram, A., Shailendra, S., & Kadel, R. (2025). Retrieval-Augmented Generation vs. Baseline LLMs: A Multi-Metric Evaluation for Knowledge-Intensive Content. Information (Switzerland), 16(9). https://doi.org/10.3390/info16090766

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free