Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

12Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

Abstract

Disclaimer: The samples presented by this paper may be considered offensive or vulgar. The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we build MONITOR TOXIC FRAME, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset TOXICN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

Cite

CITATION STYLE

APA

Lu, J., Xu, B., Zhang, X., Min, C., Yang, L., & Lin, H. (2023). Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 16235–16250). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.898

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free