In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Melo, F. S., & Lopes, M. (2010). Learning from demonstration using MDP induced metrics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6322 LNAI, pp. 385–401). https://doi.org/10.1007/978-3-642-15883-4_25
Mendeley helps you to discover research relevant for your work.