Linucb applied to monte-carlo tree search

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

UCT is a de facto standard method for Monte-Carlo tree search (MCTS) algorithms, which have been applied to various domains and have achieved remarkable success. This study proposes a family of LinUCT algorithms that incorporate LinUCB into MCTS algorithms. LinUCB is a recently developed method that generalizes past episodes by ridge regression with feature vectors and rewards. LinUCB outperforms UCB1 in contextual multi-armed bandit problems. We introduce a straightforward application of LinUCB, LinUCTPLAIN by substituting UCB1 with LinUCB in UCT. We show that it does not work well owing to the minimax structure of game trees. To better handle such tree structures, we present LinUCTRAVE and LinUCTFP by further incorporating two existing techniques, rapid action value estimation (RAVE) and feature propagation, which recursively propagates the feature vector of a node to that of its parent. Experiments were conducted with a synthetic model, which is an extension of the standard incremental random tree model in which each node has a feature vector that represents the characteristics of the corresponding position. The experimental results indicate that LinUCTRAVE, LinUCTFP, and their combination LinUCTRAVE-FP outperform UCT, especially when the branching factor is relatively large.

Cite

CITATION STYLE

APA

Mandai, Y., & Kaneko, T. (2015). Linucb applied to monte-carlo tree search. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9525, pp. 41–52). Springer Verlag. https://doi.org/10.1007/978-3-319-27992-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free