Sample complexity and performance bounds for non-parametric approximate linear programming

1Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in nonparametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Cite

CITATION STYLE

APA

Pazis, J., & Parr, R. (2013). Sample complexity and performance bounds for non-parametric approximate linear programming. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013 (pp. 782–788). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v27i1.8696

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free