Abstract
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm. © Springer-Verlag Berlin Heidelberg 2007.
Cite
CITATION STYLE
Timmer, S., & Riedmiller, M. (2007). Safe Q-learning on complete history spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4701 LNAI, pp. 394–405). Springer Verlag. https://doi.org/10.1007/978-3-540-74958-5_37
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.