Disagreement Options: Task Adaptation Through Temporally Extended Actions

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Embodied AI, learning through interaction with a physical environment, typically requires large amounts of interaction with the environment in order to learn how to solve new tasks. Training can be done in parallel, using simulated environments. However, once deployed in e.g., a real-world setting, it is not yet clear how an agent can quickly adapt its knowledge to solve new tasks. In this paper, we propose a novel Hierarchical Reinforcement Learning (HRL) method that allows an agent, when confronted with a novel task, to switch between exploiting prior knowledge through temporally extended actions, and environment exploration. We solve this trade-off by utilizing the disagreement between action distributions of selected previously acquired policies. Selection of relevant prior tasks is done by measuring the cosine similarity of their attached natural language goals in a pre-trained word-embedding. We analyze the resulting temporal abstractions, and we experimentally demonstrate the effectiveness of them in different environments. We show that our method is capable of solving new tasks using only a fraction of the environment interactions required when learning the task from scratch.

Cite

CITATION STYLE

APA

Hutsebaut-Buysse, M., Schepper, T. D., Mets, K., & Latré, S. (2021). Disagreement Options: Task Adaptation Through Temporally Extended Actions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12975 LNAI, pp. 190–205). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-86486-6_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free