Deep or wide? Learning policy and value neural networks for combinatorial games

Stefan Edelkamp

Conference Proceedings

Deep or wide? Learning policy and value neural networks for combinatorial games

Edelkamp S

Communications in Computer and Information Science (2017) 705 19-33

DOI: 10.1007/978-3-319-57969-6_2

1Citations

2Readers

Get full text

Abstract

The success in learning how to play Go at a professional level is based on training a deep neural network on a wider selection of human expert games and raises the question on the availability, the limits, and the possibilities of this technique for other combinatorial games, espe-cially when there is a lack of access to a larger body of additional expert knowledge. As a step towards this direction, we trained a value network for Tic-TacToe, providing perfect winning information obtained by retrograde analysis. Next, we trained a policy network for the SameGame, a chal-lenging combinatorial puzzle. Here, we discuss the interplay of deep learning with nested rollout policy adaptation (NRPA), a randomized algorithm for optimizing the outcome of single-player games. In both cases we observed that ordinary feed-forward neural networks can perform better than convolutional ones both in accuracy and effciency.

Cite

CITATION STYLE

APA

Edelkamp, S. (2017). Deep or wide? Learning policy and value neural networks for combinatorial games. In Communications in Computer and Information Science (Vol. 705, pp. 19–33). Springer Verlag. https://doi.org/10.1007/978-3-319-57969-6_2

Deep or wide? Learning policy and value neural networks for combinatorial games

Abstract

Cite

Register to see more suggestions