ℓ 1-penalized projected Bellman residual

9Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with ℓ 1- regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an ℓ 1-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an ℓ 1-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Geist, M., & Scherrer, B. (2012). ℓ 1-penalized projected Bellman residual. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7188 LNAI, pp. 89–101). https://doi.org/10.1007/978-3-642-29946-9_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free