RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks

6Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Adversarial attacks on deep neural networks keep raising security concerns in natural language processing research. Existing defenses focus on improving the robustness of the victim model in the training stage. However, they often neglect to proactively mitigate adversarial attacks during inference. Towards this overlooked aspect, we propose a defense framework that aims to mitigate attacks by confusing attackers and correcting adversarial contexts that are caused by malicious perturbations. Our framework comprises three components: (1) a synonym-based transformation to randomly corrupt adversarial contexts in the word level, (2) a developed BERT defender to correct abnormal contexts in the representation level, and (3) a simple detection method to filter out adversarial examples, any of which can be flexibly combined. Additionally, our framework helps improve the robustness of the victim model during training. Extensive experiments demonstrate the effectiveness of our framework in defending against word-level adversarial attacks.

Cite

CITATION STYLE

APA

Wang, Z., Liu, Z., Zheng, X., Su, Q., & Wang, J. (2023). RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 2757–2774). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.155

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free