Bias and comparison framework for abusive language datasets

Maximilian Wich; Tobias Eder; Hala Al Kuwatly; Georg Groh

Journal ArticleOPEN ACCESS

Bias and comparison framework for abusive language datasets

Wich M
Eder T
Al Kuwatly H
et al.

AI and Ethics (2022) 2(1) 79-101

DOI: 10.1007/s43681-021-00081-0

N/ACitations

29Readers

Abstract

Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that they often differ, among other things, in context, platform, sampling process, collection strategy, and labeling schema. There have been surveys on these datasets, but they compare the datasets only superficially. Therefore, we developed a bias and comparison framework for abusive language datasets for their in-depth analysis and to provide a comparison of five English and six Arabic datasets. We make this framework available to researchers and data scientists who work with such datasets to be aware of the properties of the datasets and consider them in their work.

Cite

CITATION STYLE

APA

Wich, M., Eder, T., Al Kuwatly, H., & Groh, G. (2022). Bias and comparison framework for abusive language datasets. AI and Ethics, 2(1), 79–101. https://doi.org/10.1007/s43681-021-00081-0

Bias and comparison framework for abusive language datasets

Abstract

Cite

Register to see more suggestions