We describe an on-line learning algorithm for attacking the fundamental credit assignment problem in non-stationary reactive environments. Reinforcement and pain are considered as special types of input to an agent living in the environment. The agent's only goal is to maximize cumulative reinforcement and to minimize cumulative pain. This simple goal may require to produce complicated action sequences. Supervised learning techniques for recurrent networks serve to construct a diierentiable model of the environmental dynamics which includes a model of future reinforcement. While this model is adapted, it is concurrently used for learning goal directed behavior. The method extends work done
CITATION STYLE
Schmidhuber, J. (1990). Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks. In International Neural Network Conference (pp. 817–820). Springer Netherlands. https://doi.org/10.1007/978-94-009-0643-3_97
Mendeley helps you to discover research relevant for your work.