Reinforcement learning with guided policy search using Gaussian processes

  • Jakab H
  • Csató L
  • 1

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

—Gradient based policy search algorithms benefit largely from the availability of a properly estimated state or state-action value function which can be used to reduce the variance of the gradient estimates. Additionally the use of Gaussian processes for value function approximation provides a fully probabilistic model where – using the uncertainty in the estimated value function – we can assess the amount of exploration required. In this article we present two modalities for adjusting different characteristics of the exploration in on-line learning of control policies for problems with continuous state-action spaces. The proposed methods exploit the fully probabilistic nature of the Gaussian processes and aims to constrain the exploration only to relevant subspaces, thereby speeding up convergence. We present experiments on a simulated control task to demonstrate the validity of our algorithms.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links

Authors

  • Hunor S Jakab

  • Lehel Csató

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free