SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems

32Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

Warning: this paper contains examples that may be offensive or upsetting. The social impact of natural language processing and its applications has received increasing attention. In this position paper, we focus on the problem of safety for end-to-end conversational AI. We survey the problem landscape therein, introducing a taxonomy of three observed phenomena: the INSTIGATOR, YEA-SAYER, and IMPOSTOR effects. We then empirically assess the extent to which current tools can measure these effects and current systems display them. We release these tools as part of a “first aid kit” (SAFETYKIT) to quickly assess apparent safety concerns. Our results show that, while current tools are able to provide an estimate of the relative safety of systems in various settings, they still have several shortcomings. We suggest several future directions and discuss ethical considerations.

Cite

CITATION STYLE

APA

Dinan, E., Abercrombie, G., Bergman, A. S., Spruit, S., Hovy, D., Boureau, Y. L., & Rieser, V. (2022). SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4113–4133). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.284

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free