Adaptive ε-greedy exploration in reinforcement learning based on value differences

Michel Tokic

Conference Proceedings

Adaptive ε-greedy exploration in reinforcement learning based on value differences

Tokic M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6359 LNAI 203-210

DOI: 10.1007/978-3-642-16111-7_23

222Citations

274Readers

Get full text

Abstract

This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as ε-greedy or softmax. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6359 LNAI, pp. 203–210). https://doi.org/10.1007/978-3-642-16111-7_23

Adaptive ε-greedy exploration in reinforcement learning based on value differences

Abstract

Cite

Register to see more suggestions