On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

1Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.

Cite

CITATION STYLE

APA

Gao, S., Dou, S., Zhang, Q., Huang, X., Ma, J., & Shan, Y. (2023). On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 13573–13581). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.857

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free