Reinforcement learning and markov decision processes

Martijn Van Otterlo; Marco Wiering

Book Chapter

Reinforcement learning and markov decision processes

Springer Verlag, (2012), 3-42

DOI: 10.1007/978-3-642-27645-3_1

351Citations

546Readers

Get full text

Abstract

Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. First the formal framework of Markov decision process is defined, accompanied by the definition of value functions and policies. The main part of this text deals with introducing foundational classes of algorithms for learning optimal behaviors, based on various definitions of optimality with respect to the goal of learning sequential decisions. Additionally, it surveys efficient extensions of the foundational algorithms, differing mainly in the way feedback given by the environment is used to speed up learning, and in the way they concentrate on relevant parts of the problem. For both model-based and model-free settings these efficient extensions have shown useful in scaling up to larger problems.

Author supplied keywords

Cite

CITATION STYLE

APA

Van Otterlo, M., & Wiering, M. (2012). Reinforcement learning and markov decision processes. In Adaptation, Learning, and Optimization (Vol. 12, pp. 3–42). Springer Verlag. https://doi.org/10.1007/978-3-642-27645-3_1

Reinforcement learning and markov decision processes

Abstract

Author supplied keywords

Cite

Register to see more suggestions