Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.
CITATION STYLE
Wiering, M., & Schmidhuber, J. (1998). Speeding up q(λ)-learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1398, pp. 352–363). Springer Verlag. https://doi.org/10.1007/bfb0026706
Mendeley helps you to discover research relevant for your work.