Thompson sampling based monte-carlo planning in POMDPs

Aijun Bai; Feng Wu; Zongzhang Zhang; Xiaoping Chen

Conference ProceedingsOPEN ACCESS

Thompson sampling based monte-carlo planning in POMDPs

Proceedings International Conference on Automated Planning and Scheduling, ICAPS (2014) 2014-January(January) 29-37

DOI: 10.1609/icaps.v24i1.13616

22Citations

38Readers

Abstract

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under un-certainty. One of the key challenges is the tradeoff between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm - Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D2NG-POMCP) - treats the accu-mulated reward of performing an action from a belief state in the MCTS search tree as a random variable fol-lowing an unknown distribution with hidden parame-ters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thomp-son sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.

Cite

CITATION STYLE

APA

Bai, A., Wu, F., Zhang, Z., & Chen, X. (2014). Thompson sampling based monte-carlo planning in POMDPs. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 2014-January, pp. 29–37). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v24i1.13616

Thompson sampling based monte-carlo planning in POMDPs

Abstract

Cite

Register to see more suggestions