Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Through the addition of humanly imperceptible noise to an image classified as belonging to a category (Formula presented.), targeted adversarial attacks can lead convolutional neural networks (CNNs) to classify a modified image as belonging to any predefined target class (Formula presented.). To achieve a better understanding of the inner workings of adversarial attacks, this study analyzes the adversarial images created by two completely opposite attacks against 10 ImageNet-trained CNNs. A total of (Formula presented.) adversarial images are created by (Formula presented.), a black-box evolutionary algorithm (EA), and by the basic iterative method (BIM), a white-box, gradient-based attack. We inspect and compare these two sets of adversarial images from different perspectives: the behavior of CNNs at smaller image regions, the image noise frequency, the adversarial image transferability, the image texture change, and penultimate CNN layer activations. We find that texture change is a side effect rather than a means for the attacks and that (Formula presented.) -relevant features only build up significantly from image regions of size (Formula presented.) onwards. In the penultimate CNN layers, both attacks increase the activation of units that are positively related to (Formula presented.) and units that are negatively related to (Formula presented.). In contrast to (Formula presented.) ’s white noise nature, BIM predominantly introduces low-frequency noise. BIM affects the original (Formula presented.) features more than (Formula presented.), thus producing slightly more transferable adversarial images. However, the transferability with both attacks is low, since the attacks’ (Formula presented.) -related information is specific to the output layers of the targeted CNN. We find that the adversarial images are actually more transferable at regions with sizes of (Formula presented.) than at full scale.

Cite

CITATION STYLE

APA

Chitic, R., Topal, A. O., & Leprévost, F. (2022). Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box. Applied Sciences (Switzerland), 12(14). https://doi.org/10.3390/app12147339

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free