Reinforcement learning in finite MDPs: PAC analysis

Alexander L. Strehl; Hong Li; Michael L. Littman

Journal Article

Reinforcement learning in finite MDPs: PAC analysis

Strehl A
Li H
Littman M

Journal of Machine Learning Research (2009) 10 2413-2444

ISSN: 15324435

186Citations

196Readers

Abstract

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These "PAC-MDP" algorithms include the wellknown E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined analysis for upper and lower bounds is presented to yield insight into the differences between the model-free Delayed Q-learning and the model-based R-MAX. © 2009 Alexander L. Strehl and Lihong Li and Michael L. Littman.

Author supplied keywords

Exploration
Markov decision processes
PAC-MDP
Reinforcement learning
Sample complexity

Cite

CITATION STYLE

APA

Strehl, A. L., Li, H., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10, 2413–2444.

Reinforcement learning in finite MDPs: PAC analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions