Towards a Fast Detection of Opponents in Repeated Stochastic Games

17Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresponding algorithm that combines the two established frameworks Pepper and Bayesian policy reuse. For each interaction, the algorithm faces a repeated stochastic game with an unknown (small) number of repetitions against a random opponent from a population, without observing the opponent’s identity. Our algorithm is composed of two main steps: first it draws inspiration from multiagent algorithms to obtain acting policies in stochastic games, and second it computes a belief over the possible opponents that is updated as the interaction occurs. This allows the agent to quickly select the appropriate policy against the opponent. Our results show fast detection of the opponent from its behavior, obtaining higher average rewards than the state-of-the-art baseline Pepper in repeated stochastic games.

Cite

CITATION STYLE

APA

Hernandez-Leal, P., & Kaisers, M. (2017). Towards a Fast Detection of Opponents in Repeated Stochastic Games. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10642 LNAI, pp. 239–257). Springer Verlag. https://doi.org/10.1007/978-3-319-71682-4_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free