Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

Cite

CITATION STYLE

APA

Varricchione, G., Alechina, N., Dastani, M., & Logan, B. (2023). Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14282 LNAI, pp. 328–344). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-43264-4_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free