We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels. © 2014 Springer International Publishing Switzerland.
CITATION STYLE
Ferns, N., Precup, D., & Knight, S. (2014). Bisimulation for Markov decision processes through families of functional expressions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8464 LNCS, pp. 319–342). Springer Verlag. https://doi.org/10.1007/978-3-319-06880-0_17
Mendeley helps you to discover research relevant for your work.