This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as ε-greedy or softmax. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6359 LNAI, pp. 203–210). https://doi.org/10.1007/978-3-642-16111-7_23
Mendeley helps you to discover research relevant for your work.