Convergent gradient ascent in general-sum games

Bikramjit Banerjee; Jing Peng

Conference ProceedingsOPEN ACCESS

Convergent gradient ascent in general-sum games

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2430 1-9

DOI: 10.1007/3-540-36755-1_1

6Citations

5Readers

Abstract

In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.

Cite

CITATION STYLE

APA

Banerjee, B., & Peng, J. (2002). Convergent gradient ascent in general-sum games. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2430, pp. 1–9). Springer Verlag. https://doi.org/10.1007/3-540-36755-1_1

Convergent gradient ascent in general-sum games

Abstract

Cite

Register to see more suggestions