Abstract
Several recent works have shown that large language models present privacy risks through memorization of training data. Little attention, however, has been given to the fine-tuning phase and it is not well understood how memorization risk varies across different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter). This presents increasing concern as the “pre-train and fine-tune” paradigm proliferates. We empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.
Cite
CITATION STYLE
Mireshghallah, F., Uniyal, A., Wang, T., Evans, D., & Berg-Kirkpatrick, T. (2022). An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 1816–1826). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.119
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.