Inverse Reinforcement Learning From Like-Minded Teachers

Ritesh Noothigattu; Tom Yan; Ariel D. Procaccia

Conference ProceedingsOPEN ACCESS

Inverse Reinforcement Learning From Like-Minded Teachers

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 10B 9197-9204

DOI: 10.1609/aaai.v35i10.17110

6Citations

11Readers

Abstract

We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their (linear) reward functions — while different from each other — are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property — that of matching feature expectations — yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state — a setting that is akin to multi-armed bandits.

Cite

CITATION STYLE

APA

Noothigattu, R., Yan, T., & Procaccia, A. D. (2021). Inverse Reinforcement Learning From Like-Minded Teachers. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 10B, pp. 9197–9204). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i10.17110

Inverse Reinforcement Learning From Like-Minded Teachers

Abstract

Cite

Register to see more suggestions