Reinforcement learning and markov decision processes

351Citations
Citations of this article
546Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. First the formal framework of Markov decision process is defined, accompanied by the definition of value functions and policies. The main part of this text deals with introducing foundational classes of algorithms for learning optimal behaviors, based on various definitions of optimality with respect to the goal of learning sequential decisions. Additionally, it surveys efficient extensions of the foundational algorithms, differing mainly in the way feedback given by the environment is used to speed up learning, and in the way they concentrate on relevant parts of the problem. For both model-based and model-free settings these efficient extensions have shown useful in scaling up to larger problems.

Author supplied keywords

Cite

CITATION STYLE

APA

Van Otterlo, M., & Wiering, M. (2012). Reinforcement learning and markov decision processes. In Adaptation, Learning, and Optimization (Vol. 12, pp. 3–42). Springer Verlag. https://doi.org/10.1007/978-3-642-27645-3_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free