Behavior Constraining in Weight Space for Offline Reinforcement Learning

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

Cite

CITATION STYLE

APA

Swazinna, P., Udluft, S., Hein, D., & Runkler, T. (2021). Behavior Constraining in Weight Space for Offline Reinforcement Learning. In ESANN 2021 Proceedings - 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 329–334). i6doc.com publication. https://doi.org/10.14428/esann/2021.ES2021-83

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free