An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models

65Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Several recent works have shown that large language models present privacy risks through memorization of training data. Little attention, however, has been given to the fine-tuning phase and it is not well understood how memorization risk varies across different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter). This presents increasing concern as the “pre-train and fine-tune” paradigm proliferates. We empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.

Cite

CITATION STYLE

APA

Mireshghallah, F., Uniyal, A., Wang, T., Evans, D., & Berg-Kirkpatrick, T. (2022). An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 1816–1826). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free