Intrinsic fluctuations of reinforcement learning promote cooperation

4Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with ϵ-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.

Cite

CITATION STYLE

APA

Barfuss, W., & Meylahn, J. M. (2023). Intrinsic fluctuations of reinforcement learning promote cooperation. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-27672-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free