Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

Kevin Regan; Craig Boutilier

Conference ProceedingsOPEN ACCESS

Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010 (2010) 1127-1133

DOI: 10.1609/aaai.v24i1.7740

16Citations

37Readers

Abstract

The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust solution of MDPs with imprecisely specified reward (IRMDPs). We develop new techniques for the robust optimization of IRMDPs, using the minimax regret decision criterion, that exploit the set of nondominated policies, i.e., policies that are optimal for some instantiation of the imprecise reward function. Drawing parallels to POMDP value functions, we devise a Witness-style algorithm for identifying nondominated policies. We also examine several new algorithms for computing minimax regret using the nondominated set, and examine both practically and theoretically the impact of approximating this set. Our results suggest that a small subset of the nondominated set can greatly speed up computation, yet yield very tight approximations to minimax regret.

Cite

CITATION STYLE

APA

Regan, K., & Boutilier, C. (2010). Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010 (pp. 1127–1133). AAAI Press. https://doi.org/10.1609/aaai.v24i1.7740

Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

Abstract

Cite

Register to see more suggestions