Mitigating Biases in Hate Speech Detection from A Causal Perspective

Zhehao Zhang; Jiaao Chen; Diyi Yang

Conference ProceedingsOPEN ACCESS

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) 6610-6625

DOI: 10.18653/v1/2023.findings-emnlp.440

2Citations

10Readers

Abstract

Warning: This paper discusses and contains offensive or upsetting content. Nowadays, many hate speech detectors are built to automatically detect hateful content. However, their training sets are sometimes skewed towards certain stereotypes (e.g., race or religion-related). As a result, the detectors are prone to depend on some shortcuts for predictions. Previous works mainly focus on token-level analysis and heavily rely on human experts' annotations to identify spurious correlations, which is not only costly but also incapable of discovering higher-level artifacts. In this work, we use grammar induction to find grammar patterns for hate speech and analyze this phenomenon from a causal perspective. Concretely, we categorize and verify different biases based on their spuriousness and influence on the model prediction. Then, we propose two mitigation approaches including Multi-Task Intervention and Data-Specific Intervention based on these confounders. Experiments conducted on 9 hate speech datasets demonstrate the effectiveness of our approaches. The code is available at https://github.com/SALT-NLP/Bias_Hate_Causal.

Cite

CITATION STYLE

APA

Zhang, Z., Chen, J., & Yang, D. (2023). Mitigating Biases in Hate Speech Detection from A Causal Perspective. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 6610–6625). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.440

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Abstract

Cite

Register to see more suggestions