Reinforcement learning with function approximation converges to a region

  • Gordon G
  • 24

    Readers

    Mendeley users who have this article in their library.
  • 30

    Citations

    Citations of this article.

Abstract

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(0) and V(0); the latter algorithm was used in the well-known TD-Gammon program.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Geoffrey J Gordon

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free