Linear least-squares algorithms for temporal difference learning

522Citations
Citations of this article
321Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximater linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least Squares TD (RLS TD). Although these new ID algorithms require more computation per time-step than do Sutton's TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, σTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on σTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters. © 1996 Kluwer Academic Publishers,.

Cite

CITATION STYLE

APA

Bradtke, S. J. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57. https://doi.org/10.1007/BF00114723

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free