Deep Skill Chaining with Diversity for Multi-agent Systems*

2Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multi-agent reinforcement learning requires the reward signals given by the environment to guide the convergence of individual agents’ policy networks. However, in a high-dimensional continuous space, the non-stationary environment may provide outdated experiences that lead to the inability to converge. The existing methods can be ineffective in achieving a satisfactory training performance due to the inherent non-stationary property of the multi-agent system. We propose a novel reinforcement learning scheme, MADSC, to generate an optimized cooperative policy. Our scheme utilizes mutual information to evaluate the intrinsic reward function that can generate a cooperative policy based on the option framework. In addition, by linking the learned skills to form a skill chain, the convergence speed of agent learning can be significantly accelerated. Hence, multi-agent systems can benefit from MADSC to achieve strategic advantages by significantly reducing the learning steps. Experiments are performed on the SMAC multi-agent tasks with varying difficulties. Experimental results demonstrate that our proposed scheme can effectively outperform the state-of-the-art methods, including IQL, QMIX, and hDQN, with a single layer of temporal abstraction.

Cite

CITATION STYLE

APA

Xie, Z., Ji, C., & Zhang, Y. (2022). Deep Skill Chaining with Diversity for Multi-agent Systems*. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13606 LNAI, pp. 208–220). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20503-3_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free