Model-Based Offline Planning with Trajectory Pruning

8Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from precollected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.

Cite

CITATION STYLE

APA

Zhan, X., Zhu, X., & Xu, H. (2022). Model-Based Offline Planning with Trajectory Pruning. In IJCAI International Joint Conference on Artificial Intelligence (pp. 3716–3722). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/516

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free