Abusive language detection in youtube comments leveraging replies as conversational context

23Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.

Cite

CITATION STYLE

APA

Ashraf, N., Zubiaga, A., & Gelbukh, A. (2021). Abusive language detection in youtube comments leveraging replies as conversational context. PeerJ Computer Science, 7. https://doi.org/10.7717/peerj-cs.742

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free