Reinforcement learning with replacing eligibility traces

435Citations
Citations of this article
348Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied 10 undiscounted absorbing Markov chains First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods We then analyze the relative efficiency of the two Monte Carlo methods We show that the method corresponding lu conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator. © 1996 Kluwer Academic Publishers,.

Cite

CITATION STYLE

APA

Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1–3), 123–158. https://doi.org/10.1007/BF00114726

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free