Safe Q-learning on complete history spaces

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Timmer, S., & Riedmiller, M. (2007). Safe Q-learning on complete history spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4701 LNAI, pp. 394–405). Springer Verlag. https://doi.org/10.1007/978-3-540-74958-5_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free