A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning

Naibin Gu; Peng Fu; Xiyu Liu; Zhengxiao Liu; Zheng Lin; Weiping Wang

Conference Proceedings

A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning

Gu N
Fu P
Liu X
et al.

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 1 3508-3520

DOI: 10.18653/v1/2023.acl-long.194

18Citations

22Readers

Get full text

Abstract

Parameter-Efficient Tuning (PET) has shown remarkable performance by fine-tuning only a small number of parameters of the pre-trained language models (PLMs) for the downstream tasks, while it is also possible to construct backdoor attacks due to the vulnerability of pretrained weights. However, a large reduction in the number of attackable parameters in PET will cause the user's fine-tuning to greatly affect the effectiveness of backdoor attacks, resulting in backdoor forgetting. We find that the backdoor injection process can be regarded as multitask learning, which has a convergence imbalance problem between the training of clean and poisoned data. And this problem might result in forgetting the backdoor. Based on this finding, we propose a gradient control method to consolidate the attack effect, comprising two strategies. One controls the gradient magnitude distribution cross layers within one task and the other prevents the conflict of gradient directions between tasks. Compared with previous backdoor attack methods in the scenario of PET, our method improves the effect of the attack on sentiment classification and spam detection respectively, which shows that our method is widely applicable to different tasks.

Cite

CITATION STYLE

APA

Gu, N., Fu, P., Liu, X., Liu, Z., Lin, Z., & Wang, W. (2023). A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3508–3520). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.194

A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning

Abstract

Cite

Register to see more suggestions