Active learning for reward estimation in inverse reinforcement learning

86Citations
Citations of this article
219Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5782 LNAI, pp. 31–46). https://doi.org/10.1007/978-3-642-04174-7_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free