An upper bound on the loss from approximate optimal-value functions

  • Singh S
  • Yee R
N/ACitations
Citations of this article
36Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Many reinforcement learning approaches can be formulated using the theory ofMarkov decision processes and the associated method ofdynamic programming (DP). The value of this theoretical understanding, however, is tempered by many practical concerns. One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance. This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy. We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound toQ-learning (Watkins, 1989). These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning.

Cite

CITATION STYLE

APA

Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3), 227–233. https://doi.org/10.1007/bf00993308

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free