Safe Reinforcement Learning via Statistical Model Predictive Shielding

24Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm, statistical model predictive shielding (SMPS), uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability, and empirically evaluate its performance on several benchmarks.

Cite

CITATION STYLE

APA

Bastani, O., Li, S., & Xu, A. (2021). Safe Reinforcement Learning via Statistical Model Predictive Shielding. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2021.XVII.026

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free