A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning

18Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Parameter-Efficient Tuning (PET) has shown remarkable performance by fine-tuning only a small number of parameters of the pre-trained language models (PLMs) for the downstream tasks, while it is also possible to construct backdoor attacks due to the vulnerability of pretrained weights. However, a large reduction in the number of attackable parameters in PET will cause the user's fine-tuning to greatly affect the effectiveness of backdoor attacks, resulting in backdoor forgetting. We find that the backdoor injection process can be regarded as multitask learning, which has a convergence imbalance problem between the training of clean and poisoned data. And this problem might result in forgetting the backdoor. Based on this finding, we propose a gradient control method to consolidate the attack effect, comprising two strategies. One controls the gradient magnitude distribution cross layers within one task and the other prevents the conflict of gradient directions between tasks. Compared with previous backdoor attack methods in the scenario of PET, our method improves the effect of the attack on sentiment classification and spam detection respectively, which shows that our method is widely applicable to different tasks.

Cite

CITATION STYLE

APA

Gu, N., Fu, P., Liu, X., Liu, Z., Lin, Z., & Wang, W. (2023). A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3508–3520). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.194

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free