Variance Optimization for Continuous-Time Markov Decision Processes

  • Fu Y
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.

Cite

CITATION STYLE

APA

Fu, Y. (2019). Variance Optimization for Continuous-Time Markov Decision Processes. Open Journal of Statistics, 09(02), 181–195. https://doi.org/10.4236/ojs.2019.92014

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free