Using MDP characteristics to guide exploration in reinforcement learning

8Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present a new approach for exploration in Reinforcement Learning (RL) based on certain properties of the Markov Decision Processes (MDP). Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods. In contrast to other directed methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost.

Cite

CITATION STYLE

APA

Ratitch, B., & Precup, D. (2003). Using MDP characteristics to guide exploration in reinforcement learning. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2837, pp. 313–324). Springer Verlag. https://doi.org/10.1007/978-3-540-39857-8_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free