Fine-tuning all parameters of large language models (LLMs) requires significant computational resources and is time-consuming. Recent parameter-efficient tuning methods such as Adapter tuning, Prefix tuning, and LoRA allow updating a small subset of parameters in large language models. However, they can only save approximately 30% of the training memory requirements because gradient computation and backpropagation are still necessary for these methods. This paper proposes a novel parameter-efficient tuning method for LLMs without calculating their gradients. Leveraging the discernible similarities between the parameter-efficient modules of the same task learned by both large and small language models, we put forward a strategy for transferring the parameter-efficient modules derived initially from small language models to much larger ones. To ensure a smooth and effective adaptation process, we introduce a Bridge model to guarantee dimensional consistency while stimulating a dynamic interaction between the models. We demonstrate the effectiveness of our method using the T5 and GPT-2 series of language models on the SuperGLUE benchmark. Our method achieves comparable performance to fine-tuning and parameter-efficient tuning on large language models without needing gradient-based optimization. Additionally, our method achieves up to 5.7× memory reduction compared to parameter-efficient tuning.
CITATION STYLE
Jin, F., Zhang, J., & Zong, C. (2023). Parameter-efficient Tuning for Large Language Model without Calculating Its Gradients. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 321–330). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.22
Mendeley helps you to discover research relevant for your work.