Learning a strategy that maximises total reward in a multi-agent system is a hard problem when it depends on other agents' strategies. Many previous approaches consider opponents which are reactive and memoryless. In this paper, we use sequence prediction algorithms to perform opponent modelling in two-player games, to model opponents with memory. We argue that to compete with opponents with memory, lookahead is required. We combine these algorithms with reinforcement learning and lookahead action selection, allowing them to find strategies that maximise total reward up to a limited depth. Experiments confirm lookahead is required, and show these algorithms successfully model and exploit opponent strategies with different memory lengths. The proposed approach outperforms popular and state-of-the-art reinforcement learning algorithms in terms of learning speed and final performance. © 2013 Springer-Verlag.
CITATION STYLE
Mealing, R., & Shapiro, J. L. (2013). Opponent modelling by sequence prediction and lookahead in two-player games. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7895 LNAI, pp. 385–396). https://doi.org/10.1007/978-3-642-38610-7_36
Mendeley helps you to discover research relevant for your work.