We present XHATE-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHATE-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHATE-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaptation, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.
CITATION STYLE
Glavaš, G., Karan, M., & Vulić, I. (2020). XHATE-999: Analyzing and Detecting Abusive Language Across Domains and Languages. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 6350–6365). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.559
Mendeley helps you to discover research relevant for your work.