Linear least-squares algorithms for temporal difference learning

Steven J. Bradtke

Journal ArticleOPEN ACCESS

Linear least-squares algorithms for temporal difference learning

Bradtke S

Machine Learning (1996) 22(1-3) 33-57

DOI: 10.1007/BF00114723

522Citations

321Readers

Abstract

We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximater linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least Squares TD (RLS TD). Although these new ID algorithms require more computation per time-step than do Sutton's TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, σTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on σTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters. © 1996 Kluwer Academic Publishers,.

Author supplied keywords

Cite

CITATION STYLE

APA

Bradtke, S. J. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57. https://doi.org/10.1007/BF00114723

Linear least-squares algorithms for temporal difference learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions