Safe Q-learning on complete history spaces

Stephan Timmer; Martin Riedmiller

Conference ProceedingsOPEN ACCESS

Safe Q-learning on complete history spaces

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4701 LNAI 394-405

DOI: 10.1007/978-3-540-74958-5_37

2Citations

7Readers

Get full text

Abstract

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Timmer, S., & Riedmiller, M. (2007). Safe Q-learning on complete history spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4701 LNAI, pp. 394–405). Springer Verlag. https://doi.org/10.1007/978-3-540-74958-5_37

Safe Q-learning on complete history spaces

Abstract

Cite

Register to see more suggestions