Generalisability of Topic Models in Cross-corpora Abusive Language Detection

Tulika Bose; Irina Illina; Dominique Fohr

Conference Proceedings

Generalisability of Topic Models in Cross-corpora Abusive Language Detection

NLP4IF 2021 - NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, Proceedings of the 4th Workshop (2021) 51-56

DOI: 10.18653/v1/2021.nlp4if-1.8

4Citations

59Readers

Get full text

Abstract

Rapidly changing social media content calls for robust and generalisable abuse detection models. However, the state-of-the-art supervised models display degraded performance when they are evaluated on abusive comments that differ from the training corpus. We investigate if the performance of supervised models for cross-corpora abuse detection can be improved by incorporating additional information from topic models, as the latter can infer the latent topic mixtures from unseen samples. In particular, we combine topical information with representations from a model tuned for classifying abusive comments. Our performance analysis reveals that topic models are able to capture abuse-related topics that can transfer across corpora, and result in improved generalisability.

Cite

CITATION STYLE

APA

Bose, T., Illina, I., & Fohr, D. (2021). Generalisability of Topic Models in Cross-corpora Abusive Language Detection. In NLP4IF 2021 - NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, Proceedings of the 4th Workshop (pp. 51–56). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.nlp4if-1.8

Generalisability of Topic Models in Cross-corpora Abusive Language Detection

Abstract

Cite

Register to see more suggestions