DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis

Christoph Demus; Jonas Pitz; Mina Schütz; Nadine Probol; Melanie Siegel; Dirk Labudde

Conference ProceedingsOPEN ACCESS

DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis

WOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop (2022) 143-153

DOI: 10.18653/v1/2022.woah-1.14

18Citations

32Readers

Abstract

In this work, we present a publicly available offensive language dataset (DeTox-dataset) containing 10,278 annotated German social media comments collected in the first half of 2021. With twelve different annotation categories annotated by six annotators, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments. Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments contexts and do conversation analyses in order to research the contagion of offensive language in conversations. The dataset is available in our GitHub repository: https://github.com/hdaSprachtechnologie/detox

Cite

CITATION STYLE

APA

Demus, C., Pitz, J., Schütz, M., Probol, N., Siegel, M., & Labudde, D. (2022). DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis. In WOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop (pp. 143–153). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.woah-1.14

DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis

Abstract

Cite

Register to see more suggestions