Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

36Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

Abstract

The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient theorem for the refined update and provide a local convergence guarantee for the Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.

Cite

CITATION STYLE

APA

Zheng, L., Fiez, T., Alumbaugh, Z., Chasnov, B., & Ratliff, L. J. (2022). Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 9217–9224). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i8.20908

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free