Decentralized POMDPs

77Citations
Citations of this article
89Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework. In a Dec-POMDP, a team of agents collaborates to maximize a global reward based on local information only. This means that agents do not observe a Markovian signal during execution and therefore the agents’ individual policies map fromhistories to actions. Searching for an optimal joint policy is an extremely hard problem: it is NEXP-complete. This suggests, assuming NEXP≠EXP, that any optimal solution method will require doubly exponential time in the worst case. This chapter focuses on planning for Dec-POMDPs over a finite horizon. It covers the forward heuristic search approach to solving Dec-POMDPs, as well as the backward dynamic programming approach. Also, it discusses how these relate to the optimal Q-value function of a Dec-POMDP. Finally, it provides pointers to other solution methods and further related topics.

Author supplied keywords

Cite

CITATION STYLE

APA

Oliehoek, F. A. (2012). Decentralized POMDPs. In Adaptation, Learning, and Optimization (Vol. 12, pp. 471–503). Springer Verlag. https://doi.org/10.1007/978-3-642-27645-3_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free