Thompson sampling based monte-carlo planning in POMDPs

22Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

Abstract

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under un-certainty. One of the key challenges is the tradeoff between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm - Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D2NG-POMCP) - treats the accu-mulated reward of performing an action from a belief state in the MCTS search tree as a random variable fol-lowing an unknown distribution with hidden parame-ters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thomp-son sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.

Cite

CITATION STYLE

APA

Bai, A., Wu, F., Zhang, Z., & Chen, X. (2014). Thompson sampling based monte-carlo planning in POMDPs. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 2014-January, pp. 29–37). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v24i1.13616

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free