A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset

  • Pichai K
N/ACitations
Citations of this article
26Readers
Mendeley users who have this article in their library.

Abstract

The evolution of natural language processing has seen marked advancements, particularly with the advent of models like BERT, Transformers, and GPT variants, with recent additions like GPT and Bard. This paper investigates the Retrieval-Augmented Generation (RAG) framework, providing insights into its modular design and the impact of its constituent modules on performance. Leveraging a unique dataset from Amazon Rainforest natives and biologists, our research demonstrates the significance of preserving indigenous cultures and biodiversity. The experiment employs a customizable RAG methodology, allowing for the interchangeability of various components, such as the base language model and similarity score tools. Findings indicate that while GPT performs slightly better when given context, Palm exhibits superior performance without context. The results also suggest that models tend to perform optimally when paired with similarity scores from their native platforms. Conclusively, our approach showcases the potential of a modular RAG design in optimizing language models, presenting it as a more advantageous strategy compared to traditional fine-tuning of large language models.

Cite

CITATION STYLE

APA

Pichai, K. (2023). A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.6213

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free