Explaining Deep Neural Network Models with Adversarial Gradient Integration

Deng Pan; Xin Li; Dongxiao Zhu

Conference ProceedingsOPEN ACCESS

Explaining Deep Neural Network Models with Adversarial Gradient Integration

IJCAI International Joint Conference on Artificial Intelligence (2021) 2876-2883

DOI: 10.24963/ijcai.2021/396

34Citations

14Readers

Abstract

Deep neural networks (DNNs) have became one of the most high performing tools in a broad range of machine learning areas. However, the multi-layer non-linearity of the network architectures prevent us from gaining a better understanding of the models' predictions. Gradient based attribution methods (e.g., Integrated Gradient (IG)) that decipher input features' contribution to the prediction task have been shown to be highly effective yet requiring a reference input as the anchor for explaining model's output. The performance of DNN model interpretation can be quite inconsistent with regard to the choice of references. Here we propose an Adversarial Gradient Integration (AGI) method that integrates the gradients from adversarial examples to the target example along the curve of steepest ascent to calculate the resulting contributions from all input features. Our method doesn't rely on the choice of references, hence can avoid the ambiguity and inconsistency sourced from the reference selection. We demonstrate the performance of our AGI method and compare with competing methods in explaining image classification results. Code is available from https://github.com/pd90506/AGI.

Cite

CITATION STYLE

APA

Pan, D., Li, X., & Zhu, D. (2021). Explaining Deep Neural Network Models with Adversarial Gradient Integration. In IJCAI International Joint Conference on Artificial Intelligence (pp. 2876–2883). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/396

Explaining Deep Neural Network Models with Adversarial Gradient Integration

Abstract

Cite

Register to see more suggestions