Categorizing Online Harassment on Twitter

10Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Harassment on social media is a hard problem to tackle since those platforms are virtual spaces in which people enjoy the liberty to express themselves with no restrictions. Furthermore, a large amount of users generating publications on online media like Twitter contributes to the hardness of controlling sexism and sexual harassment content, requesting robust methods of Machine Learning (ML) to be applied in this task. To do so, this work aims at comparing the performance of supervised ML algorithms to categorize online harassment in Twitter posts. We tested Logistic Regression, Gaussian Naïve Bayes, Decision Trees, Random Forest, Linear SVM, Gaussian SVM, Polynomial SVM, Multi-Layer Perceptron, and AdaBoost methods on the SIMAH Competition benchmark data, using TF-IDF vectors and Word2Vec embeddings as features. As results, we reached scores above 0.80% of accuracy for all the harassment types in the data. We also showed that, when using TF-IDF vectors, Linear and Gaussian SVM are the best methods to predict harassment content, while Decision Trees and Random Forest better categorize physical and sexual harassment. Overall, by using TF-IDF vectors presented higher performance on these data, suggesting that the training corpus for Word2Vec influenced negatively on the classification task outcomes.

Cite

CITATION STYLE

APA

Saeidi, M., Samuel, S. B., Milios, E., Zeh, N., & Berton, L. (2020). Categorizing Online Harassment on Twitter. In Communications in Computer and Information Science (Vol. 1168 CCIS, pp. 283–297). Springer. https://doi.org/10.1007/978-3-030-43887-6_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free