Self-organized reinforcement learning based on policy gradient in nonstationary environments

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In real-world problems, the environment surrounding a controlled system is nonstationary, and the optimal control may change with time. It is difficult to learn such controls when using reinforcement learning (RL) which usually assumes stationary Markov decision processes. A modular-based RL method was formerly proposed by Doya et al., in which multiple-paired predictors and controllers were gated to produce nonstationary controls, and its effectiveness in nonstationary problems was shown. However, learning of time-dependent decomposition of the constituent pairs could be unstable, and the resulting control was somehow obscure due to the heuristical combination of predictors and controllers. To overcome these difficulties, we propose a new modular RL algorithm, in which predictors are learned in a self-organized manner to realize stable decomposition and controllers are appropriately optimized by a policy gradient-based RL method. Computer simulations show that our method achieves faster and more stable learning than the previous one. © Springer-Verlag Berlin Heidelberg 2008.

Cite

CITATION STYLE

APA

Hiei, Y., Mori, T., & Ishii, S. (2008). Self-organized reinforcement learning based on policy gradient in nonstationary environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5163 LNCS, pp. 367–376). https://doi.org/10.1007/978-3-540-87536-9_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free