Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

Kohei Ohashi; Kosuke Nakanishi; Yuji Yasui; Shin Ishii

Journal ArticleOPEN ACCESS

Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

IEEE Access (2023) 11 100798-100809

DOI: 10.1109/ACCESS.2023.3314750

6Citations

18Readers

Abstract

Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Ohashi, K., Nakanishi, K., Yasui, Y., & Ishii, S. (2023). Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions. IEEE Access, 11, 100798–100809. https://doi.org/10.1109/ACCESS.2023.3314750

Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

Abstract

Author supplied keywords

Cite

Register to see more suggestions