Abstract
Parameter-Efficient Tuning (PET) has shown remarkable performance by fine-tuning only a small number of parameters of the pre-trained language models (PLMs) for the downstream tasks, while it is also possible to construct backdoor attacks due to the vulnerability of pretrained weights. However, a large reduction in the number of attackable parameters in PET will cause the user's fine-tuning to greatly affect the effectiveness of backdoor attacks, resulting in backdoor forgetting. We find that the backdoor injection process can be regarded as multitask learning, which has a convergence imbalance problem between the training of clean and poisoned data. And this problem might result in forgetting the backdoor. Based on this finding, we propose a gradient control method to consolidate the attack effect, comprising two strategies. One controls the gradient magnitude distribution cross layers within one task and the other prevents the conflict of gradient directions between tasks. Compared with previous backdoor attack methods in the scenario of PET, our method improves the effect of the attack on sentiment classification and spam detection respectively, which shows that our method is widely applicable to different tasks.
Cite
CITATION STYLE
Gu, N., Fu, P., Liu, X., Liu, Z., Lin, Z., & Wang, W. (2023). A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3508–3520). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.194
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.