A framework for computing bounds for the return of a policy

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge - namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Pǎduraru, C., Precup, D., & Pineau, J. (2012). A framework for computing bounds for the return of a policy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7188 LNAI, pp. 201–212). https://doi.org/10.1007/978-3-642-29946-9_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free