Learning to predict by the methods of temporal differences

  • Sutton R
N/ACitations
Citations of this article
585Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This article in troduces a class of incremen tal learning procedures specialized for prediction?that is? for using past experience with an incompletely kno wn system to predict its future behavior. Whereas conventional prediction learning methods assign credit by means of the difference between predicted and actual outcomes the new methods assign credit by means of the difference bet een temporally successive predictions. Although such temporal difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods; and they produce more accurate predictions. We argue that most problems which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Cite

CITATION STYLE

APA

Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1007/bf00115009

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free