Reinforcement learning for average reward zero-sum games

2Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider Reinforcement Learning for average reward zero-sum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparable frequencies and present an algorithm that converges to the optimal Q-function, given the observed play of the opponent.

Cite

CITATION STYLE

APA

Mannor, S. (2004). Reinforcement learning for average reward zero-sum games. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3120, pp. 49–63). Springer Verlag. https://doi.org/10.1007/978-3-540-27819-1_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free