Adaptive ε-greedy exploration in reinforcement learning based on value differences

222Citations
Citations of this article
274Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as ε-greedy or softmax. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6359 LNAI, pp. 203–210). https://doi.org/10.1007/978-3-642-16111-7_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free