This chapter briefly reviews some fundamental concepts, standard problem formulations, and classical algorithms of reinforcement learning (RL). Specifically, we first review Markov decision processes (MDPs) and dynamic programming (DP), which provide mathematical foundations for both the problem formulation and algorithm design for RL. Then we review some classical RL algorithms, such as Q-learning, Sarsa, policy gradient, and Thompson sampling. Finally, we provide a high-level review of the exploration schemes in RL and approximate solution methods for large-scale RL problems. At the end of this chapter, we also provide some pointers for further reading.
CITATION STYLE
Wen, Z. (2022). Reinforcement Learning. In Springer Series in Supply Chain Management (Vol. 18, pp. 15–48). Springer Nature. https://doi.org/10.1007/978-3-031-01926-5_2
Mendeley helps you to discover research relevant for your work.