Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs

24Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter discusses a reduction of discounted continuous-time Markov decision processes (CTMDPs) to discrete-time Markov decision processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduced by the author in 2004 as a reduction to discounted MDPs. Here we show that this reduction also holds for unbounded jump and reward rates, but the corresponding MDP may not be discounted. However, the analysis of the equivalent total-reward MDP leads to the description of optimal policies for the CTMDP and provides methods for their computation.

Cite

CITATION STYLE

APA

Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Systems and Control: Foundations and Applications (pp. 77–97). Birkhauser. https://doi.org/10.1007/978-0-8176-8337-5_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free