Learning from demonstration using MDP induced metrics

13Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.

Abstract

In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure. © 2010 Springer-Verlag Berlin Heidelberg.

References Powered by Scopus

A survey of robot learning from demonstration

2661Citations
N/AReaders
Get full text

Apprenticeship learning via inverse reinforcement learning

2312Citations
N/AReaders
Get full text

Equivalence notions and model minimization in Markov decision processes

277Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A survey of inverse reinforcement learning: Challenges, methods and progress

403Citations
N/AReaders
Get full text

A survey of inverse reinforcement learning techniques

84Citations
N/AReaders
Get full text

Bridging the gap between imitation learning and inverse reinforcement learning

76Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Melo, F. S., & Lopes, M. (2010). Learning from demonstration using MDP induced metrics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6322 LNAI, pp. 385–401). https://doi.org/10.1007/978-3-642-15883-4_25

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 25

71%

Professor / Associate Prof. 5

14%

Researcher 5

14%

Readers' Discipline

Tooltip

Computer Science 23

66%

Engineering 10

29%

Nursing and Health Professions 1

3%

Psychology 1

3%

Save time finding and organizing research with Mendeley

Sign up for free