Flexible text generation for counterfactual fairness probing

4Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

Abstract

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don’t take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.

Cite

CITATION STYLE

APA

Fryer, Z., Axelrod, V., Packer, B., Beutel, A., Chen, J., & Webster, K. (2022). Flexible text generation for counterfactual fairness probing. In WOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop (pp. 209–229). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.woah-1.20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free