We present a new approach for exploration in Reinforcement Learning (RL) based on certain properties of the Markov Decision Processes (MDP). Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods. In contrast to other directed methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost.
CITATION STYLE
Ratitch, B., & Precup, D. (2003). Using MDP characteristics to guide exploration in reinforcement learning. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2837, pp. 313–324). Springer Verlag. https://doi.org/10.1007/978-3-540-39857-8_29
Mendeley helps you to discover research relevant for your work.