This chapter discusses a reduction of discounted continuous-time Markov decision processes (CTMDPs) to discrete-time Markov decision processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduced by the author in 2004 as a reduction to discounted MDPs. Here we show that this reduction also holds for unbounded jump and reward rates, but the corresponding MDP may not be discounted. However, the analysis of the equivalent total-reward MDP leads to the description of optimal policies for the CTMDP and provides methods for their computation.
CITATION STYLE
Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Systems and Control: Foundations and Applications (pp. 77–97). Birkhauser. https://doi.org/10.1007/978-0-8176-8337-5_5
Mendeley helps you to discover research relevant for your work.