Learning from monte carlo rollouts with opponent models for playing tron

Stefan J.L. Knegt; Madalina M. Drugan; Marco A. Wiering

Conference Proceedings

Learning from monte carlo rollouts with opponent models for playing tron

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11352 LNAI 105-129

DOI: 10.1007/978-3-030-05453-3_6

2Citations

7Readers

Get full text

Abstract

This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all possible actions from which improved action estimates can be computed. This allows to extend experience replay by making it possible to update the state-action values of all actions in a given game state simultaneously. The results show that the use of experience replay that updates the Q-values of all actions simultaneously strongly outperforms the conventional experience replay that only updates the Q-value of the performed action. The results also show that using short or long rollout horizons during training lead to similar good performances against two fixed opponents.

Author supplied keywords

Cite

CITATION STYLE

APA

Knegt, S. J. L., Drugan, M. M., & Wiering, M. A. (2019). Learning from monte carlo rollouts with opponent models for playing tron. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11352 LNAI, pp. 105–129). Springer Verlag. https://doi.org/10.1007/978-3-030-05453-3_6

Learning from monte carlo rollouts with opponent models for playing tron

Abstract

Author supplied keywords

Cite

Register to see more suggestions