Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations

Zhaodong Wang; Matthew E. Taylor

Conference ProceedingsOPEN ACCESS

Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations

IJCAI International Joint Conference on Artificial Intelligence (2019) 2019-August 3820-3827

DOI: 10.24963/ijcai.2019/530

11Citations

66Readers

Abstract

Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance. This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with an online confidence-based performance analysis. DRoP leverages the demonstrator's knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-the-art learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent.

Cite

CITATION STYLE

APA

Wang, Z., & Taylor, M. E. (2019). Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3820–3827). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/530

Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations

Abstract

Cite

Register to see more suggestions