Abstract
State-of-the-art sampling-based online POMDP solvers compute near-optimal policies for POMDPs with very large state spaces. However, when faced with large observation spaces, they may become overly optimistic and compute suboptimal policies, because of particle divergence. This paper presents a new online POMDP solver DESPOT-α, which builds upon the widely used DESPOT solver. DESPOT-α improves the practical performance of online planning for POMDPs with large observation as well as state spaces. Like DESPOT, DESPOT-α uses the particle belief approximation and searches a deter-minized sparse belief tree. To tackle large observation spaces, DESPOT-α shares sub-policies among many observations during online policy computation. The value function of a sub-policy is a linear function of the belief, commonly known as α-vector. We introduce a particle approximation of the α-vector to improve the efficiency of online policy search. We further speed up DESPOT-α using CPU and GPU parallelization ideas introduced in HyP-DESPOT. Experimental results show that DESPOT-α/HyP-DESPOT-α outperform DESPOT/HyP-DESPOT on POMDPs with large observation spaces, including a complex simulation task involving an autonomous vehicle driving among many pedestrians.
Cite
CITATION STYLE
Garg, N. P., Hsu, D., & Lee, W. S. (2019). DESPOT-α: Online POMDP Planning With Large State And Observation Spaces. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2019.XV.006
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.