Constructing optimal dynamic treatment regimes for chronic disorders based on patient data is a problem of multi-stage decision making about the best sequence of treatments. This problem bears strong resemblance to the problem of reinforcement learning in computer science, a branch of machine learning that deals with the problem of multi-stage, sequential decision making by a learning agent. In this chapter, we review the necessary concepts of reinforcement learning, connect them to the relevant statistical literature, and develop a mathematical framework that will enable us to treat the problem of estimating the optimal dynamic treatment regimes rigorously.
CITATION STYLE
Chakraborty, B., & Moodie, E. E. M. (2013). Statistical Reinforcement Learning (pp. 31–52). https://doi.org/10.1007/978-1-4614-7428-9_3
Mendeley helps you to discover research relevant for your work.