Safe and sample-efficient reinforcement learning algorithms for factored environments

0Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π0. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π0 is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

Cite

CITATION STYLE

APA

Simão, T. D. (2019). Safe and sample-efficient reinforcement learning algorithms for factored environments. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 6460–6461). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/919

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free