Bounded parameter Markov decision processes

22Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we introduce the notion of a bounded parameter Markov decision process (BMDP) as a generalization of the familiar exact MDP. A bounded parameter MDP is a set of exact MDPs specified by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). BMDPs form an efficiently solvable special case of the already known class of MDPs with imprecise parameters (MDPIPs). Bounded parameter MDPs can be used to represent variation or uncertainty concerning the parameters of sequential decision problems in cases where no prior probabilities on the parameter values are available. Bounded parameter MDPs can also be used in aggregation schemes to represent the variation in the transition probabilities for different base states aggregated together in the same aggregate state. We introduce interval value functions as a natural extension of traditional value functions. An interval value function assigns a closed real interval to each state, representing the assertion that the value of that state falls within that interval. An interval value function can be used to bound the performance of a policy over the set of exact MDPs associated with a given bounded parameter MDP. We describe an iterative dynamic programming algorithm called interval policy evaluation which computes an interval value function for a given BMDP and specified policy. Interval policy evaluation on a policy π computes the most restrictive interval value function that is sound, i.e., that bounds the value function for w in every exact MDP in the set denned by the bounded parameter MDP. We define optimistic and pessimistic notions of optimal policy, and provide a variant of value iteration [Bellman, 1957] that we call interval value iteration which computes a policies for a BMDP that are optimal in these senses.

Cite

CITATION STYLE

APA

Givan, R., Leach, S., & Dean, T. (1997). Bounded parameter Markov decision processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1348 LNAI, pp. 234–246). Springer Verlag. https://doi.org/10.1007/3-540-63912-8_89

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free