This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Jaulmes, R., Pineau, J., & Precup, D. (2005). Active learning in partially observable Markov decision processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 601–608). https://doi.org/10.1007/11564096_59
Mendeley helps you to discover research relevant for your work.