An intrinsic reward mechanism for efficient exploration

  • Şimşek Ö
  • Barto A
  • 108


    Mendeley users who have this article in their library.
  • 30


    Citations of this article.


How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm's use for learning a policy for a skill given its reward function---an important but neglected component of skill discovery.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text


  • Özgür Şimşek

  • Andrew G. Barto

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free