Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione; Natasha Alechina; Mehdi Dastani; Brian Logan

Conference Proceedings

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 14282 LNAI 328-344

DOI: 10.1007/978-3-031-43264-4_21

0Citations

6Readers

Get full text

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

Cite

CITATION STYLE

APA

Varricchione, G., Alechina, N., Dastani, M., & Logan, B. (2023). Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14282 LNAI, pp. 328–344). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-43264-4_21

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Abstract

Cite

Register to see more suggestions