Optimal tuning of continual online exploration in reinforcement learning

Youssef Achbany; Francois Fouss; Luh Yen; Alain Pirotte; Marco Saerens

Conference Proceedings

Optimal tuning of continual online exploration in reinforcement learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4131 LNCS - I 790-800

DOI: 10.1007/11840817_82

10Citations

8Readers

Get full text

Abstract

This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by denning the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, "exploitation" is maximized for constant "exploration". This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full "blind" exploration is performed. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Achbany, Y., Fouss, F., Yen, L., Pirotte, A., & Saerens, M. (2006). Optimal tuning of continual online exploration in reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4131 LNCS-I, pp. 790–800). Springer Verlag. https://doi.org/10.1007/11840817_82

Optimal tuning of continual online exploration in reinforcement learning

Abstract

Cite

Register to see more suggestions