Abstract
In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.
Cite
CITATION STYLE
Swazinna, P., Udluft, S., Hein, D., & Runkler, T. (2021). Behavior Constraining in Weight Space for Offline Reinforcement Learning. In ESANN 2021 Proceedings - 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 329–334). i6doc.com publication. https://doi.org/10.14428/esann/2021.ES2021-83
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.