Three-head neural network architecture for Monte Carlo tree search

Chao Gao; Martin Müller; Ryan Hayward

Conference Proceedings

Three-head neural network architecture for Monte Carlo tree search

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 3762-3768

DOI: 10.24963/ijcai.2018/523

16Citations

18Readers

Get full text

Abstract

AlphaGo Zero pioneered the concept of twohead neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a twohead architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

Cite

CITATION STYLE

APA

Gao, C., Müller, M., & Hayward, R. (2018). Three-head neural network architecture for Monte Carlo tree search. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3762–3768). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/523

Three-head neural network architecture for Monte Carlo tree search

Abstract

Cite

Register to see more suggestions