Long-term values in markov decision processes, (Co)algebraically

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Cite

CITATION STYLE

APA

Feys, F. M. V., Hansen, H. H., & Moss, L. S. (2018). Long-term values in markov decision processes, (Co)algebraically. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11202 LNCS, pp. 78–99). Springer Verlag. https://doi.org/10.1007/978-3-030-00389-0_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free