We propose a general definition of composition operator on Markov Decision Processes with rewards (MDPs) and identify a well behaved class of operators, called safe, that are guaranteed to be non-extensive w.r.t. the bisimilarity pseudometrics of Ferns et al. [10], which measure behavioral similarities between MDPs. For MDPs built using safe/non-extensive operators, we present the first method that exploits the structure of the system for (exactly) computing the bisimilarity distance on MDPs. Experimental results show significant improvements upon the non-compositional technique. © 2013 Springer-Verlag.
CITATION STYLE
Bacci, G., Bacci, G., Larsen, K. G., & Mardare, R. (2013). Computing behavioral distances, compositionally. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8087 LNCS, pp. 74–85). https://doi.org/10.1007/978-3-642-40313-2_9
Mendeley helps you to discover research relevant for your work.