Optimized Adversarial Example With Classification Score Pattern Vulnerability Removed

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Neural networks provide excellent service on recognition tasks such as image recognition and speech recognition as well as for pattern analysis and other tasks in fields related to artificial intelligence. However, neural networks are vulnerable to adversarial examples. An adversarial example is a sample that is designed to be misclassified by a target model, although it poses no problem for recognition by humans, that is created by applying a minimal perturbation to a legitimate sample. Because the perturbation applied to the legitimate sample to create an adversarial example is optimized, the classification score for the target class has the characteristic of being similar to that for the legitimate class. This regularity occurs because minimal perturbations are applied only until the classification score for the target class is slightly higher than that for the legitimate class. Given the existence of this regularity in the classification scores, it is easy to detect an optimized adversarial example by looking for this pattern. However, the existing methods for generating optimized adversarial examples do not consider their weakness of allowing detectability by recognizing the pattern in the classification scores. To address this weakness, we propose an optimized adversarial example generation method in which the weakness due to the classification score pattern is removed. In the proposed method, a minimal perturbation is applied to a legitimate sample such that the classification score for the legitimate class is less than that for some of the other classes, and an optimized adversarial example is created with the pattern vulnerability removed. The results show that using 500 iterations, the proposed method can generate an optimized adversarial example that has a 100% attack success rate, with distortions of 2.81 and 2.23 for MNIST and Fashion-MNIST, respectively.

Cite

CITATION STYLE

APA

Kwon, H., Ko, K., & Kim, S. (2022). Optimized Adversarial Example With Classification Score Pattern Vulnerability Removed. IEEE Access, 10, 35804–35813. https://doi.org/10.1109/ACCESS.2021.3110473

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free