Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

5Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters – the rewards and transitions – is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of N problem instances from the prior, with the hope that for large enough N, good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss – an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast convergence rates for mirror descent in regularized MDPs, we show that regularized MDPs satisfy a certain quadratic growth criterion, which is sufficient to establish stability. This result, which may be of independent interest, allows us to study the effect of regularization on generalization in the Bayesian RL setting.

References Powered by Scopus

Understanding machine learning: From theory to algorithms

3596Citations
N/AReaders
Get full text

Stability and Generalization

1251Citations
N/AReaders
Get full text

PAC model-free reinforcement learning

236Citations
N/AReaders
Get full text

Cited by Powered by Scopus

The Generalization of Non-Negative Matrix Factorization Based on Algorithmic Stability

3Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tamar, A., Soudry, D., & Zisselman, E. (2022). Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 8423–8431). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i8.20818

Readers over time

‘21‘22‘2302468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

60%

Professor / Associate Prof. 2

40%

Readers' Discipline

Tooltip

Computer Science 3

50%

Engineering 3

50%

Save time finding and organizing research with Mendeley

Sign up for free
0