Abstract
Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP-a stateof-the-art offline solver- to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of informationgathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source-a challenging real-world problem.
Cite
CITATION STYLE
Dressel, L., & Kochenderfer, M. J. (2017). Efficient decision-theoretic target localization. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 0, pp. 70–78). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v27i1.13832
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.