Robust reinforcement learning with a stochastic value function

Reiji Hatsugai; Mary Inaba

Conference Proceedings

Robust reinforcement learning with a stochastic value function

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10710 LNCS 519-526

DOI: 10.1007/978-3-319-72926-8_43

0Citations

5Readers

Get full text

Abstract

The field of reinforcement learning has been significantly advanced by the application of deep learning. The Deep Deterministic Policy Gradient(DDPG), an actor-critic method for continuous control, can derive satisfactory policies by use of a deep neural network. However, in common with other deep neural networks, the DDPG requires a large number of training samples and careful hyperparameter tuning. In this paper, we propose a Stochastic Value Function (SVF) that treats a value function such as the Q function as a stochastic variable that can be sampled from N(μQ, σQ). To learn the appropriate value functions, we use Bayesian regression with KL divergence in place of simple regression with squared errors. We demonstrate that the technique used in Trust Region Policy Optimization (TRPO) can provide efficient learning. We implemented DDPG with SVF (DDPG-SVF) and confirmed (1) that DDPG-SVF converged well, with high sampling efficiency, (2) that DDPG-SVFobtained good results while requiring less hyperparameter tuning, and (3) that the TRPO technique offers an effective way of addressing the hyperparameter tuning problem.

Cite

CITATION STYLE

APA

Hatsugai, R., & Inaba, M. (2018). Robust reinforcement learning with a stochastic value function. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10710 LNCS, pp. 519–526). Springer Verlag. https://doi.org/10.1007/978-3-319-72926-8_43

Robust reinforcement learning with a stochastic value function

Abstract

Cite

Register to see more suggestions