Tentative exploration on reinforcement learning algorithms for stochastic rewards

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper addresses a way to generate mixed strategies using reinforcement learning algorithms in domains with stochastic rewards. A new algorithm, based on Q-learning model, called TERSQ is introduced. As a difference from other approaches for stochastic scenarios, TERSQ uses a global exploration rate for all the state/actions in the same run. This exploration rate is selected at the beginning of each round, using a probabilistic distribution, which is updated once the run is finished. In this paper we compare TERSQ with similar approaches that use probability distributions depending on state-action pairs. Two experimental scenarios have been considered. First one deals with the problem of learning the optimal way to combine several evolutionary algorithms used simultaneously by a hybrid approach. In the second one, the objective is to learn the best strategy for a set of competing agents in combat-based videogame. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Peña, L., Latorre, A., Peña, J. M., & Ossowski, S. (2009). Tentative exploration on reinforcement learning algorithms for stochastic rewards. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5572 LNAI, pp. 336–343). https://doi.org/10.1007/978-3-642-02319-4_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free