We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.
CITATION STYLE
Lattimore, T., & Hutter, M. (2014). Bayesian reinforcement learning with exploration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8776, pp. 170–184). Springer Verlag. https://doi.org/10.1007/978-3-319-11662-4_13
Mendeley helps you to discover research relevant for your work.