Datasets of Slovene and Croatian Moderated News Comments

18Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.

Abstract

This paper presents two large newly constructed datasets of moderated news comments from two highly popular online news portals in the respective countries: the Slovene RTV MCC and the Croatian 24sata. The datasets are analyzed by performing manual annotation of the types of the content which have been deleted by moderators and by investigating deletion trends among users and threads. Next, initial experiments on automatically detecting the deleted content in the datasets are presented. Both datasets are published in encrypted form, to enable others to perform experiments on detecting content to be deleted without revealing potentially inappropriate content. Finally, the baseline classification models trained on the non-encrypted datasets are disseminated as well to enable real-world use.

Cite

CITATION STYLE

APA

Ljubešić, N., Erjavec, T., & Fišer, D. (2018). Datasets of Slovene and Croatian Moderated News Comments. In 2nd Workshop on Abusive Language Online - Proceedings of the Workshop, co-located with EMNLP 2018 (pp. 124–131). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-5116

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free