Inverse Reinforcement Learning From Like-Minded Teachers

6Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their (linear) reward functions — while different from each other — are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property — that of matching feature expectations — yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state — a setting that is akin to multi-armed bandits.

Cite

CITATION STYLE

APA

Noothigattu, R., Yan, T., & Procaccia, A. D. (2021). Inverse Reinforcement Learning From Like-Minded Teachers. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 10B, pp. 9197–9204). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i10.17110

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free