Multi-agent reinforcement learning with directed exploration and selective memory reuse

14Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Many tasks require cooperation and coordination of multiple agents. Multi-agent reinforcement learning (MARL) can effectively learn solutions to these problems, but exploration and local optima problems are still open research topics. In this paper, we propose a new multi-agent policy gradient method called decentralized exploration and selective memory policy gradient (DecESPG) that addresses these issues. DecESPG consists of two additional components built on policy gradient: 1) an exploration bonus component that directs agents to explore novel observations and actions and 2) a selective memory component that records past trajectories to reuse valuable experience and reinforce cooperative behavior. Experimental results verify that the proposed method learns faster and outperforms state-of-the-art MARL algorithms.

Cite

CITATION STYLE

APA

Jiang, S., & Amato, C. (2021). Multi-agent reinforcement learning with directed exploration and selective memory reuse. In Proceedings of the ACM Symposium on Applied Computing (pp. 777–784). Association for Computing Machinery. https://doi.org/10.1145/3412841.3441953

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free