Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts

Skyler Hallinan; Alisa Liu; Yejin Choi; Maarten Sap

Conference Proceedings

Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 2 228-242

DOI: 10.18653/v1/2023.acl-short.21

28Citations

29Readers

Get full text

Abstract

Text detoxification has the potential to mitigate the harms of toxicity by rephrasing text to remove offensive meaning, but subtle toxicity remains challenging to tackle. We introduce MARCO, a detoxification algorithm that combines controllable generation and text rewriting methods using a Product of Experts with autoencoder language models (LMs). MARCO uses likelihoods under a non-toxic LM (expert) and a toxic LM (anti-expert) to find candidate words to mask and replace. We evaluate our method on several subtle toxicity and microaggressions datasets, and show that it not only outperforms baselines on automatic metrics, but MARCO’s rewrites are preferred 2.1× more in human evaluation. Its applicability to instances of subtle toxicity is especially promising, demonstrating a path forward for addressing increasingly elusive online hate.

Cite

CITATION STYLE

APA

Hallinan, S., Liu, A., Choi, Y., & Sap, M. (2023). Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 228–242). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-short.21

Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts

Abstract

Cite

Register to see more suggestions