Hybrid learning for multi-agent cooperation with sub-optimal demonstrations

7Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper aims to learn multi-agent cooperation where each agent performs its actions in a decentralized way. In this case, it is very challenging to learn decentralized policies when the rewards are global and sparse. Recently, learning from demonstrations (LfD) provides a promising way to handle this challenge. However, in many practical tasks, the available demonstrations are often sub-optimal. To learn better policies from these sub-optimal demonstrations, this paper follows a centralized learning and decentralized execution framework and proposes a novel hybrid learning method based on multi-agent actor-critic. At first, the expert trajectory returns generated from demonstration actions are used to pre-train the centralized critic network. Then, multi-agent decisions are made by best response dynamics based on the critic and used to train the decentralized actor networks. Finally, the demonstrations are updated by the actor networks, and the critic and actor networks are learned jointly by running the above two steps alliteratively. We evaluate the proposed approach on a real-time strategy combat game. Experimental results show that the approach outperforms many competing demonstration-based methods.

Cite

CITATION STYLE

APA

Peng, P., Xing, J., & Cao, L. (2020). Hybrid learning for multi-agent cooperation with sub-optimal demonstrations. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 3037–3043). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/420

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free