This paper addresses a way to generate mixed strategies using reinforcement learning algorithms in domains with stochastic rewards. A new algorithm, based on Q-learning model, called TERSQ is introduced. As a difference from other approaches for stochastic scenarios, TERSQ uses a global exploration rate for all the state/actions in the same run. This exploration rate is selected at the beginning of each round, using a probabilistic distribution, which is updated once the run is finished. In this paper we compare TERSQ with similar approaches that use probability distributions depending on state-action pairs. Two experimental scenarios have been considered. First one deals with the problem of learning the optimal way to combine several evolutionary algorithms used simultaneously by a hybrid approach. In the second one, the objective is to learn the best strategy for a set of competing agents in combat-based videogame. © 2009 Springer Berlin Heidelberg.
CITATION STYLE
Peña, L., Latorre, A., Peña, J. M., & Ossowski, S. (2009). Tentative exploration on reinforcement learning algorithms for stochastic rewards. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5572 LNAI, pp. 336–343). https://doi.org/10.1007/978-3-642-02319-4_40
Mendeley helps you to discover research relevant for your work.