Bayesian Experience Reuse for Learning from Multiple Demonstrators

1Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Learning from Demonstrations (LfD) is a powerful approach for incorporating advice from experts in the form of demonstrations. However, demonstrations often come from multiple sub-optimal experts with conflicting goals, rendering them difficult to incorporate effectively in online settings. To address this, we formulate a quadratic program whose solution yields an adaptive weighting over experts, that can be used to sample experts with relevant goals. In order to compare different source and target task goals safely, we model their uncertainty using normal-inverse-gamma priors, whose posteriors are learned from demonstrations using Bayesian neural networks with a shared encoder. Our resulting approach, which we call Bayesian Experience Reuse, can be applied for LfD in static and dynamic decision-making settings. We demonstrate its effectiveness for minimizing multi-modal functions, and optimizing a high-dimensional supply chain with cost uncertainty, where it is also shown to improve upon the performance of the demonstrators' policies.

Cite

CITATION STYLE

APA

Gimelfarb, M., Sanner, S., & Lee, C. G. (2021). Bayesian Experience Reuse for Learning from Multiple Demonstrators. In IJCAI International Joint Conference on Artificial Intelligence (pp. 2425–2431). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/334

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free