Robust reinforcement learning with a stochastic value function

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The field of reinforcement learning has been significantly advanced by the application of deep learning. The Deep Deterministic Policy Gradient(DDPG), an actor-critic method for continuous control, can derive satisfactory policies by use of a deep neural network. However, in common with other deep neural networks, the DDPG requires a large number of training samples and careful hyperparameter tuning. In this paper, we propose a Stochastic Value Function (SVF) that treats a value function such as the Q function as a stochastic variable that can be sampled from N(μQ, σQ). To learn the appropriate value functions, we use Bayesian regression with KL divergence in place of simple regression with squared errors. We demonstrate that the technique used in Trust Region Policy Optimization (TRPO) can provide efficient learning. We implemented DDPG with SVF (DDPG-SVF) and confirmed (1) that DDPG-SVF converged well, with high sampling efficiency, (2) that DDPG-SVFobtained good results while requiring less hyperparameter tuning, and (3) that the TRPO technique offers an effective way of addressing the hyperparameter tuning problem.

Cite

CITATION STYLE

APA

Hatsugai, R., & Inaba, M. (2018). Robust reinforcement learning with a stochastic value function. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10710 LNCS, pp. 519–526). Springer Verlag. https://doi.org/10.1007/978-3-319-72926-8_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free