Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Though a reinforcement learning framework has numerous achievements, it requires a careful shaping of a re-ward function that represents the objective of a task. There is a class of task in which an expert could demonstrate the optimal way of doing, but it is difficult to design a proper reward function. For these tasks, an inverse reinforcement learning approach seems useful because it makes it possible to estimates a reward function from expert's demonstrations. Most existing inverse reinforcement learning algorithms assume that an expert gives demonstrations in a unique environment. However, an expert also could provide demonstrations of tasks within other environments of which have a specific objective function. For example, though it is hard to represent objective explicitly for a driving task, the driver could give demonstrations under multiple situations. In such cases, it is natural to utilize these demonstrations in multiple environments to estimate expert's reward functions. We formulate this problem as Bayesian Inverse Rein-forcement Learning problem and propose a Markov Chain Monte Carlo method for the problem. Experimental results show that the proposed method quantitatively overperforms existing methods.

Cite

CITATION STYLE

APA

Yusukc, N., & Sachiyo, A. (2020). Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward. Transactions of the Japanese Society for Artificial Intelligence, 35(1). https://doi.org/10.1527/tjsai.G-J73

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free