Abstract
Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under un-certainty. One of the key challenges is the tradeoff between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm - Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D2NG-POMCP) - treats the accu-mulated reward of performing an action from a belief state in the MCTS search tree as a random variable fol-lowing an unknown distribution with hidden parame-ters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thomp-son sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.
Cite
CITATION STYLE
Bai, A., Wu, F., Zhang, Z., & Chen, X. (2014). Thompson sampling based monte-carlo planning in POMDPs. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 2014-January, pp. 29–37). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v24i1.13616
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.