On multiagent Q-learning in a semi-competitive domain

32Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Q.learningis a recent reinforcement learning (RL) algorithm that does not need a model of its environment and can be used on-line. Therefore it is well-snited for use in repeated games against an unknown opponent. Most RL research has been confined to single agent settings or to multiagent settings where the agents have totally positively correlated payoffs (team problems) or totally negatively correlated payoffs (zerosum games). This paper is an empirical study of reinforcement learning in the iterated prisoner’s dilemma (IPD), where the agents' payoffs are neither totally positively nor totally negatively correlated. RL is considerably more difficult n such a domain. This pape r investigates the ability of a variety of Q-learning agents to play the IPD game against an unknown opponent. In some experiments, the opponent is the fixed strategy Tit-for-Tat, while in others it is another Q-learner. All the Qlearners learned to play optimally against Tit-for-Tat. Playing against another learner was more difficult because the adaptation of the other learner creates a nonstationary environment in ways that are detailed in the paper. The learners that were studied varied along three dimensions: the length of history they received as context, the type of memory they employed (lookup tables based on restricted history windows or recurrent neural networks (RNNs) that can theoretically store features from arbitrarily deep in the past), and the exploration schedule they followed. Although all the learners faced difficulties when playing against other learners, agents with longer history windows, lookup table memories, and longer exploration schedules fared best in the IPD games.

Cite

CITATION STYLE

APA

Sandholm, T. W., & Crites, R. H. (1996). On multiagent Q-learning in a semi-competitive domain. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1042, pp. 191–205). Springer Verlag. https://doi.org/10.1007/3-540-60923-7_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free