Robust adversarial attack against explainable deep classification models based on adversarial images with different patch sizes and perturbation ratios

Thi Thu Huong Le; Hyoeun Kang; Howon Kim

Journal ArticleOPEN ACCESS

Robust adversarial attack against explainable deep classification models based on adversarial images with different patch sizes and perturbation ratios

IEEE Access (2021) 9 133049-133061

DOI: 10.1109/ACCESS.2021.3115764

12Citations

44Readers

Abstract

In recent years, adversarial attack methods have been deceived rather easily on deep neural networks (DNNs). In practice, adversarial patches cause misclassification that can be extremely effective. However, many existing adversarial patches are used for attacking DNNs, and only a few of them apply to both the DNN and its explanation model. In this paper, we present different adversarial patches that misguide the prediction of DNN models and change the cause of prediction results of interpretation models, such as gradient-weighted class activation mapping. The proposed adversarial patches have appropriate location and perturbation ratios, which comprise visible or less visible adversarial patches. In addition, image patches within small arrays are localized without covering or overlapping with any of the main objects in a natural image. In particular, we generate two adversarial patches that cover only 3% and 1.5% of the pixels in the original image, while they do not cover the main objects in the natural image. Our experiments are performed using four pre-trained DNN models and the ImageNet dataset. We also examine the inaccurate results of the interpretation models through mask and heatmap visualization. The proposed adversarial attack method could be a reference for developing robust network interpretation models that are more reliable for the decision-making process of pre-trained DNN models.

Author supplied keywords

Cite

CITATION STYLE

APA

Le, T. T. H., Kang, H., & Kim, H. (2021). Robust adversarial attack against explainable deep classification models based on adversarial images with different patch sizes and perturbation ratios. IEEE Access, 9, 133049–133061. https://doi.org/10.1109/ACCESS.2021.3115764

Robust adversarial attack against explainable deep classification models based on adversarial images with different patch sizes and perturbation ratios

Abstract

Author supplied keywords

Cite

Register to see more suggestions