Malicious Attacks against Deep Reinforcement Learning Interpretations

Mengdi Huai; Jianhui Sun; Renqin Cai; Liuyi Yao; Aidong Zhang

Conference ProceedingsOPEN ACCESS

Malicious Attacks against Deep Reinforcement Learning Interpretations

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020) 472-482

DOI: 10.1145/3394486.3403089

29Citations

39Readers

Get full text

Abstract

The past years have witnessed the rapid development of deep reinforcement learning (DRL), which is a combination of deep learning and reinforcement learning (RL). However, the adoption of deep neural networks makes the decision-making process of DRL opaque and lacking transparency. Motivated by this, various interpretation methods for DRL have been proposed. However, those interpretation methods make an implicit assumption that they are performed in a reliable and secure environment. In practice, sequential agent-environment interactions expose the DRL algorithms and their corresponding downstream interpretations to extra adversarial risk. In spite of the prevalence of malicious attacks, there is no existing work studying the possibility and feasibility of malicious attacks against DRL interpretations. To bridge this gap, in this paper, we investigate the vulnerability of DRL interpretation methods. Specifically, we introduce the first study of the adversarial attacks against DRL interpretations, and propose an optimization framework based on which the optimal adversarial attack strategy can be derived. In addition, we study the vulnerability of DRL interpretation methods to the model poisoning attacks, and present an algorithmic framework to rigorously formulate the proposed model poisoning attack. Finally, we conduct both theoretical analysis and extensive experiments to validate the effectiveness of the proposed malicious attacks against DRL interpretations.

Author supplied keywords

Cite

CITATION STYLE

APA

Huai, M., Sun, J., Cai, R., Yao, L., & Zhang, A. (2020). Malicious Attacks against Deep Reinforcement Learning Interpretations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 472–482). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403089

Malicious Attacks against Deep Reinforcement Learning Interpretations

Abstract

Author supplied keywords

Cite

Register to see more suggestions