Model selection in reinforcement learning

46Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BErMin, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BErMin leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity. © The Author(s) 2011.

Cite

CITATION STYLE

APA

Farahmand, A. M., & Szepesvári, C. (2011). Model selection in reinforcement learning. Machine Learning, 85(3), 299–332. https://doi.org/10.1007/s10994-011-5254-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free