Policy search via the signed derivative

J. Zico Kolter; Andrew Y. Ng

Conference ProceedingsOPEN ACCESS

Policy search via the signed derivative

Robotics: Science and Systems (2010) 5 209-216

DOI: 10.7551/mitpress/8727.003.0028

9Citations

99Readers

Abstract

We consider policy search for reinforcement learning: learning policy parameters, for some fixed policy cLass, that optimize performance of a system. In this paper. we propose a novel policy gradient method based on an approximation we call the Signed Derivative; the approximation is based on the intuition that it is often very easy to guess the direction in which control inputs affect future state variables, even if we do not have an accurate model of the system. The resulting algorithm is very simple, requires no model of the environment, and we show that it can outperform standard stochastic estimators of the gradient; indeed we show that Signed Derivative algorithm can in fact perform as well as the true (model-based) policy gradient, but without knowledge of the model, We evaluate the algorithm's performance on both a simulated task and two real- world tasks driving an 1W car along a specified trajectory, and jumping onto obstacles with an quadrtsped robot and in all cases achieve good performance after very little training.

Cite

CITATION STYLE

APA

Kolter, J. Z., & Ng, A. Y. (2010). Policy search via the signed derivative. In Robotics: Science and Systems (Vol. 5, pp. 209–216). MIT Press Journals. https://doi.org/10.7551/mitpress/8727.003.0028

Policy search via the signed derivative

Abstract

Cite

Register to see more suggestions