Backdoor attacks inject maliciously constructed data into a training set of machine learning models so that, at test time, the trained model misclassifies inputs patched with a backdoor trigger as an attacker-desired output. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as “clean-label attacks.” The effectiveness of existing clean-label backdoor attacks crucially relies on knowledge about the entire training set. However, in practice, obtaining this knowledge is costly or impossible as training data are often gathered from multiple independent sources (e.g., face images from different users). It remains a question of whether backdoor attacks still present real threats. In this paper, we provide an affirmative answer to this question by designing an algorithm to mount clean-label backdoor attacks based on the knowledge of samples from the target class and public out-of-distribution data. By inserting maliciously-crafted examples totaling just 0.5% of the target-class data size and 0.05% of the training set size, we can manipulate a model trained on this poisoned dataset to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger; at the same time, the trained model still maintains good accuracy on typical test examples without the trigger as if it were trained on a clean dataset. Our attack is highly effective across datasets and models, and even when the trigger is injected into the physical world. We explore the space of defenses and find that Narcissus can evade the latest state-of-the-art defenses in their vanilla form or after a simple adaptation. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains durable features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.
CITATION STYLE
Zeng, Y., Pan, M., Just, H. A., Lyu, L., Qiu, M., & Jia, R. (2023). Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. In CCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (pp. 771–785). Association for Computing Machinery, Inc. https://doi.org/10.1145/3576915.3616617
Mendeley helps you to discover research relevant for your work.