Currently popular reinforcement learning methods are based on estimating value functions that indicate the long-term value of each problem state. In many domains, such as those traditionally studied in AI planning research, the size of state spaces precludes the individual storage of state value estimates. Consequently, most practical implementations of reinforcement learning methods have stored value functions using generalizing function approximators, with mixed results. We analyze the effects of approximation error on the performance of goal-based tasks, revealing potentially severe scaling difficulties. Empirical evidence is presented that suggests when difficulties are likely to occur and explains some of the widely differing results reported in the literature.
CITATION STYLE
Mcdonald, M. A. F., & Hingston, P. (1997). Discounted reinforcement learning does not scale. Computational Intelligence, 13(1), 126–143. https://doi.org/10.1111/0824-7935.00035
Mendeley helps you to discover research relevant for your work.