DESPOT-α: Online POMDP Planning With Large State And Observation Spaces

41Citations
Citations of this article
52Readers
Mendeley users who have this article in their library.
Get full text

Abstract

State-of-the-art sampling-based online POMDP solvers compute near-optimal policies for POMDPs with very large state spaces. However, when faced with large observation spaces, they may become overly optimistic and compute suboptimal policies, because of particle divergence. This paper presents a new online POMDP solver DESPOT-α, which builds upon the widely used DESPOT solver. DESPOT-α improves the practical performance of online planning for POMDPs with large observation as well as state spaces. Like DESPOT, DESPOT-α uses the particle belief approximation and searches a deter-minized sparse belief tree. To tackle large observation spaces, DESPOT-α shares sub-policies among many observations during online policy computation. The value function of a sub-policy is a linear function of the belief, commonly known as α-vector. We introduce a particle approximation of the α-vector to improve the efficiency of online policy search. We further speed up DESPOT-α using CPU and GPU parallelization ideas introduced in HyP-DESPOT. Experimental results show that DESPOT-α/HyP-DESPOT-α outperform DESPOT/HyP-DESPOT on POMDPs with large observation spaces, including a complex simulation task involving an autonomous vehicle driving among many pedestrians.

Cite

CITATION STYLE

APA

Garg, N. P., Hsu, D., & Lee, W. S. (2019). DESPOT-α: Online POMDP Planning With Large State And Observation Spaces. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2019.XV.006

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free