An intrinsic reward mechanism for efficient exploration

57Citations
Citations of this article
126Readers
Mendeley users who have this article in their library.
Get full text

Abstract

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm's use for learning a policy for a skill given its reward function - an important but neglected component of skill discovery.

Cite

CITATION STYLE

APA

Şimşek, Ö., & Barto, A. G. (2006). An intrinsic reward mechanism for efficient exploration. In ACM International Conference Proceeding Series (Vol. 148, pp. 833–840). https://doi.org/10.1145/1143844.1143949

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free