Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

15Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI.

Cite

CITATION STYLE

APA

Zhang, Y., Qu, G., Xu, P., Lin, Y., Chen, Z., & Wierman, A. (2023). Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7(1). https://doi.org/10.1145/3579443

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free