Local policy search in a convex space and conservative policy iteration as boosted policy search

Bruno Scherrer; Matthieu Geist

Conference ProceedingsOPEN ACCESS

Local policy search in a convex space and conservative policy iteration as boosted policy search

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8726 LNAI(PART 3) 35-50

DOI: 10.1007/978-3-662-44845-8_3

21Citations

14Readers

Abstract

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural solution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic programming. © 2014 Springer-Verlag.

Cite

CITATION STYLE

APA

Scherrer, B., & Geist, M. (2014). Local policy search in a convex space and conservative policy iteration as boosted policy search. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8726 LNAI, pp. 35–50). Springer Verlag. https://doi.org/10.1007/978-3-662-44845-8_3

Local policy search in a convex space and conservative policy iteration as boosted policy search

Abstract

Cite

Register to see more suggestions