Solving deep memory POMDPs with Recurrent Policy gradients

90Citations
Citations of this article
189Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs with Recurrent Policy gradients. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4668 LNCS, pp. 697–706). Springer Verlag. https://doi.org/10.1007/978-3-540-74690-4_71

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free