Using MDP characteristics to guide exploration in reinforcement learning

Bohdana Ratitch; Doina Precup

Conference ProceedingsOPEN ACCESS

Using MDP characteristics to guide exploration in reinforcement learning

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2003) 2837 313-324

DOI: 10.1007/978-3-540-39857-8_29

8Citations

8Readers

Abstract

We present a new approach for exploration in Reinforcement Learning (RL) based on certain properties of the Markov Decision Processes (MDP). Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods. In contrast to other directed methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Ratitch, B., & Precup, D. (2003). Using MDP characteristics to guide exploration in reinforcement learning. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2837, pp. 313–324). Springer Verlag. https://doi.org/10.1007/978-3-540-39857-8_29

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 3

50%

Researcher 3

50%

Readers' Discipline

Computer Science 4

67%

Engineering 2

33%

Using MDP characteristics to guide exploration in reinforcement learning

Abstract

References Powered by Scopus

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

Convergence results for single-step on-policy reinforcement-learning algorithms

SURVEY OF SOME RESULTS IN STOCHASTIC ADAPTIVE CONTROL.

Cited by Powered by Scopus

An information-theoretic approach to curiosity-driven reinforcement learning

Characterizing reinforcement learning methods through parameterized learning problems

Sample complexity bounds of exploration

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline