Towards finite-sample convergence of direct reinforcement learning

Shiau Hong Lim; Gerald DeJong

Conference ProceedingsOPEN ACCESS

Towards finite-sample convergence of direct reinforcement learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3720 LNAI 230-241

DOI: 10.1007/11564096_25

3Citations

8Readers

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Lim, S. H., & DeJong, G. (2005). Towards finite-sample convergence of direct reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 230–241). https://doi.org/10.1007/11564096_25

Towards finite-sample convergence of direct reinforcement learning

Abstract

Cite

Register to see more suggestions