Factorized asymptotic Bayesian policy search for POMDPs

Masaaki Imaizumi; Ryohei Fujimaki

Conference ProceedingsOPEN ACCESS

Factorized asymptotic Bayesian policy search for POMDPs

IJCAI International Joint Conference on Artificial Intelligence (2017) 0 4346-4352

DOI: 10.24963/ijcai.2017/607

1Citations

16Readers

Abstract

This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highlyflexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted loglikelihood (MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.

Cite

CITATION STYLE

APA

Imaizumi, M., & Fujimaki, R. (2017). Factorized asymptotic Bayesian policy search for POMDPs. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 4346–4352). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/607

Factorized asymptotic Bayesian policy search for POMDPs

Abstract

Cite

Register to see more suggestions