Learning from demonstration using MDP induced metrics

13Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Melo, F. S., & Lopes, M. (2010). Learning from demonstration using MDP induced metrics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6322 LNAI, pp. 385–401). https://doi.org/10.1007/978-3-642-15883-4_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free