Shield Synthesis for Reinforcement Learning

Bettina Könighofer; Florian Lorber; Nils Jansen; Roderick Bloem

Conference Proceedings

Shield Synthesis for Reinforcement Learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12476 LNCS 290-306

DOI: 10.1007/978-3-030-61362-4_16

25Citations

7Readers

Get full text

Abstract

Reinforcement learning algorithms discover policies that maximize reward. However, these policies generally do not adhere to safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called a shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning agent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized from different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce specifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications in probabilistic temporal logic. Third, we discuss how to synthesize timed shields from timed automata specifications. This paper summarizes the application areas, advantages, disadvantages and synthesis approaches for the three types of shields and gives an overview of experimental results.

Cite

CITATION STYLE

APA

Könighofer, B., Lorber, F., Jansen, N., & Bloem, R. (2020). Shield Synthesis for Reinforcement Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12476 LNCS, pp. 290–306). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning

Abstract

Cite

Register to see more suggestions