HONEST: Measuring Hurtful Sentence Completion in Language Models

Debora Nozza; Federico Bianchi; Dirk Hovy

Conference ProceedingsOPEN ACCESS

HONEST: Measuring Hurtful Sentence Completion in Language Models

NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (2021) 2398-2406

DOI: 10.18653/v1/2021.naacl-main.191

82Citations

90Readers

Abstract

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that 4.3% of the time, language models complete a sentence with a hurtful word. These cases are not random, but follow language and gender-specific patterns. We propose a score to measure hurtful sentence completions in language models (HONEST). It uses a systematic template- and lexicon-based bias evaluation methodology for six languages. Our findings suggest that these models replicate and amplify deep-seated societal stereotypes about gender roles. Sentence completions refer to sexual promiscuity when the target is female in 9% of the time, and in 4% to homosexuality when the target is male. The results raise questions about the use of these models in production settings.

Cite

CITATION STYLE

APA

Nozza, D., Bianchi, F., & Hovy, D. (2021). HONEST: Measuring Hurtful Sentence Completion in Language Models. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 2398–2406). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.191

HONEST: Measuring Hurtful Sentence Completion in Language Models

Abstract

Cite

Register to see more suggestions