Three-head neural network architecture for Monte Carlo tree search

16Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

AlphaGo Zero pioneered the concept of twohead neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a twohead architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

Cite

CITATION STYLE

APA

Gao, C., Müller, M., & Hayward, R. (2018). Three-head neural network architecture for Monte Carlo tree search. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3762–3768). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/523

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free