A Markov decision process with constraints of coherent risk measures is discussed. Risk-sensitive expected rewards under utility functions are approximated by weighted average value-at-risks, and risk constraints are described by coherent risk measures. In this paper, coherent risk measures are represented as weighted average value-at-risks with the best risk spectrum derived from decision maker’s risk averse utility, and the risk spectrum can inherit the risk averse property of the decision maker’s utility as weighting. To find risk levels for feasible ranges, firstly a risk-minimizing problem is discussed by mathematical programming. Next dynamic risk-sensitive reward maximization under risk constraints is investigated. Dynamic programming can not be applied to this dynamic optimization model, and we try other approaches. A few numerical examples are given to understand the obtained results.
CITATION STYLE
Yoshida, Y. (2019). Risk-Sensitive Markov Decision Under Risk Constraints with Coherent Risk Measures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11676 LNAI, pp. 29–40). Springer Verlag. https://doi.org/10.1007/978-3-030-26773-5_3
Mendeley helps you to discover research relevant for your work.