Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Henrique Dias Pastor; Igor Oliveira Borges; Valdinei Freire; Karina Valdivia Delgado; Leliane Nunes de Barros

Conference Proceedings

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12468 LNAI 383-395

DOI: 10.1007/978-3-030-60884-2_28

0Citations

4Readers

Get full text

Abstract

A Markov Decision Process (MDP) is commonly used to model a sequential decision-making problem where an agent interacts with an uncertain environment while looking for minimizing the expected cost accumulated along the process. If the process horizon is infinite, a discount factor [0, 1] is used to indicate the importance the agent gives to future states. If the agent’s mission is to achieve a goal state, the process becomes a Stochastic Shortest Path MDP (SSP-MDP), the in fact model used for probabilistic planning in AI. Although several efficient solutions have been proposed to solve SSP-MDPs, there are little research carried out when we consider the “risk” in such processes. A Risk Sensitive MDP (RS-MDP) allows modeling the agent’s risk-averse and risk-prone attitudes, by including a risk and a discount factor in the MDP definition. The proof of convergence of known solutions based on dynamic programming adapted for RS-MDPs, such as risk-sensitive value iteration (VI) and risk-sensitive policy iteration (PI), rely on the discount factor. However, when solving an SSP-MDP we look for a proper policy, i.e. a policy that guarantees to reach the goal while minimizing the accumulated expected cost, which is naturally modeled without discount factor. Besides, it has been shown that the discount factor can modify the chosen risk attitude when solving a risk sensitive SSP-MDP. Thus, in this work we aim to formally proof the convergence of the PI algorithm for a Risk Sensitive SSP-MDP based on operators that use a piecewise-linear transformation function, without a discount factor. We also run experiments in the benchmark River domain showing how the intended risk attitude, in an interval of extreme risk-averse and extreme risk-prone, varies with the discount factor, i.e. how an optimal policy for Risk Sensitive SSP-MDP can go from being a risk-prune policy to a risk-averse one, depending on the discount factor.

Cite

CITATION STYLE

APA

Dias Pastor, H., Oliveira Borges, I., Freire, V., Valdivia Delgado, K., & Nunes de Barros, L. (2020). Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12468 LNAI, pp. 383–395). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60884-2_28

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Abstract

Cite

Register to see more suggestions