Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments.

Cite

CITATION STYLE

APA

Nakata, Y., & Arai, S. (2019). Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning. Transactions of the Japanese Society for Artificial Intelligence, 34(6). https://doi.org/10.1527/tjsai.B-J23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free