Abstract
Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Button's DYNA system. © 1996 Kluwer Academic Publishers,.
Author supplied keywords
Cite
CITATION STYLE
Dayan, P., & Sejnowski, T. J. (1996). Exploration bonuses and dual control. Machine Learning, 25(1), 5–22. https://doi.org/10.1007/bf00115298
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.