WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

75Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

Cite

CITATION STYLE

APA

Yang, Q., Simão, T. D., Tindemans, S. H., & Spaan, M. T. J. (2021). WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 12A, pp. 10639–10646). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i12.17272

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free