Online Shielding for Stochastic Systems

15Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2-player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.

Cite

CITATION STYLE

APA

Könighofer, B., Rudolf, J., Palmisano, A., Tappler, M., & Bloem, R. (2021). Online Shielding for Stochastic Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12673 LNCS, pp. 231–248). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-76384-8_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free