Solving deep memory POMDPs with Recurrent Policy gradients

Daan Wierstra; Alexander Foerster; Jan Peters; Jürgen Schmidhuber

Conference Proceedings

Solving deep memory POMDPs with Recurrent Policy gradients

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4668 LNCS(PART 1) 697-706

DOI: 10.1007/978-3-540-74690-4_71

90Citations

189Readers

Get full text

Abstract

This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs with Recurrent Policy gradients. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4668 LNCS, pp. 697–706). Springer Verlag. https://doi.org/10.1007/978-3-540-74690-4_71

Solving deep memory POMDPs with Recurrent Policy gradients

Abstract

Cite

Register to see more suggestions