Recursive adaptation of stepsize parameter for non-stationary environments

Itsuki Noda

Conference Proceedings

Recursive adaptation of stepsize parameter for non-stationary environments

Noda I

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 5924 LNAI 74-90

DOI: 10.1007/978-3-642-11814-2_5

4Citations

13Readers

Get full text

Abstract

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for non-stationary environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning methods) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to optimize a certain criterion, for example, to minimize square errors. The proposed method is validated both theoretically and experimentally. © 2010 Springer.

Cite

CITATION STYLE

APA

Noda, I. (2010). Recursive adaptation of stepsize parameter for non-stationary environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5924 LNAI, pp. 74–90). https://doi.org/10.1007/978-3-642-11814-2_5

Recursive adaptation of stepsize parameter for non-stationary environments

Abstract

Cite

Register to see more suggestions