Tree-based reinforcement learning for estimating optimal dynamic treatment regimes

Yebin Tao; Lu Wang; Daniel Almirall

Journal ArticleOPEN ACCESS

Tree-based reinforcement learning for estimating optimal dynamic treatment regimes

Annals of Applied Statistics (2018) 12(3) 1914-1938

DOI: 10.1214/18-AOAS1137

36Citations

54Readers

Abstract

Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.

Author supplied keywords

Cite

CITATION STYLE

APA

Tao, Y., Wang, L., & Almirall, D. (2018). Tree-based reinforcement learning for estimating optimal dynamic treatment regimes. Annals of Applied Statistics, 12(3), 1914–1938. https://doi.org/10.1214/18-AOAS1137

Tree-based reinforcement learning for estimating optimal dynamic treatment regimes

Abstract

Author supplied keywords

Cite

Register to see more suggestions