Options in Multi-task Reinforcement Learning - Transfer via Reflection

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Temporally extended actions such as options are known to lead to improvements in reinforcement learning (RL). At the same time, transfer learning across different RL tasks is an increasingly active area of research. Following Baxter’s formalism for transfer, the corresponding RL question considers the benefit that an RL agent can achieve on new tasks based on experience from previous tasks in a common “learning environment”. We address this in the specific context of goal-based multi-task RL, where the different tasks correspond to different goal states within a common state space, and we introduce Landmark Options Via Reflection (LOVR), a flexible framework that uses options to transfer domain knowledge. As an explicit analog of principles in transfer learning, we provide theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.

Cite

CITATION STYLE

APA

Denis, N., & Fraser, M. (2019). Options in Multi-task Reinforcement Learning - Transfer via Reflection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11489 LNAI, pp. 225–237). Springer Verlag. https://doi.org/10.1007/978-3-030-18305-9_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free