Policy Search via the Signed Derivative

  • Kolter J
  • Ng A
  • 66


    Mendeley users who have this article in their library.
  • 6


    Citations of this article.


We consider policy search for reinforcement learning:
learning policy parameters, for some fixed policy class, that
optimize performance of a system. In this paper, we propose
a novel policy gradient method based on an approximation we
call the Signed Derivative; the approximation is based on the
intuition that it is often very easy to guess the direction in which
control inputs affect future state variables, even if we do not
have an accurate model of the system. The resulting algorithm
is very simple, requires no model of the environment, and we
show that it can outperform standard stochastic estimators of
the gradient; indeed we show that Signed Derivative algorithm
can in fact perform as well as the true (model-based) policy
gradient, but without knowledge of the model. We evaluate the
algorithm's performance on both a simulated task and two realworld
tasks --- driving an RC car along a specified trajectory,
and jumping onto obstacles with an quadruped robot --- and in
all cases achieve good performance after very little training.

Author-supplied keywords

  • RL
  • dynamic robot control
  • policy gradient
  • robot learning

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • J Zico Kolter

  • Andrew Y Ng

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free