The evolution of natural language processing has seen marked advancements, particularly with the advent of models like BERT, Transformers, and GPT variants, with recent additions like GPT and Bard. This paper investigates the Retrieval-Augmented Generation (RAG) framework, providing insights into its modular design and the impact of its constituent modules on performance. Leveraging a unique dataset from Amazon Rainforest natives and biologists, our research demonstrates the significance of preserving indigenous cultures and biodiversity. The experiment employs a customizable RAG methodology, allowing for the interchangeability of various components, such as the base language model and similarity score tools. Findings indicate that while GPT performs slightly better when given context, Palm exhibits superior performance without context. The results also suggest that models tend to perform optimally when paired with similarity scores from their native platforms. Conclusively, our approach showcases the potential of a modular RAG design in optimizing language models, presenting it as a more advantageous strategy compared to traditional fine-tuning of large language models.
CITATION STYLE
Pichai, K. (2023). A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.6213
Mendeley helps you to discover research relevant for your work.