A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset

Kieran Pichai

Journal ArticleOPEN ACCESS

A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset

Pichai K

Journal of Student Research (2023) 12(4)

DOI: 10.47611/jsrhs.v12i4.6213

N/ACitations

26Readers

Abstract

The evolution of natural language processing has seen marked advancements, particularly with the advent of models like BERT, Transformers, and GPT variants, with recent additions like GPT and Bard. This paper investigates the Retrieval-Augmented Generation (RAG) framework, providing insights into its modular design and the impact of its constituent modules on performance. Leveraging a unique dataset from Amazon Rainforest natives and biologists, our research demonstrates the significance of preserving indigenous cultures and biodiversity. The experiment employs a customizable RAG methodology, allowing for the interchangeability of various components, such as the base language model and similarity score tools. Findings indicate that while GPT performs slightly better when given context, Palm exhibits superior performance without context. The results also suggest that models tend to perform optimally when paired with similarity scores from their native platforms. Conclusively, our approach showcases the potential of a modular RAG design in optimizing language models, presenting it as a more advantageous strategy compared to traditional fine-tuning of large language models.

Cite

CITATION STYLE

APA

Pichai, K. (2023). A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.6213

A Retrieval-Augmented Generation Based Large Language Model Benchmarked On a Novel Dataset

Abstract

Cite

Register to see more suggestions