Categorizing Online Harassment on Twitter

Mozhgan Saeidi; Samuel Bruno Samuel; Evangelos Milios; Norbert Zeh; Lilian Berton

Conference ProceedingsOPEN ACCESS

Categorizing Online Harassment on Twitter

Communications in Computer and Information Science (2020) 1168 CCIS 283-297

DOI: 10.1007/978-3-030-43887-6_22

10Citations

21Readers

Abstract

Harassment on social media is a hard problem to tackle since those platforms are virtual spaces in which people enjoy the liberty to express themselves with no restrictions. Furthermore, a large amount of users generating publications on online media like Twitter contributes to the hardness of controlling sexism and sexual harassment content, requesting robust methods of Machine Learning (ML) to be applied in this task. To do so, this work aims at comparing the performance of supervised ML algorithms to categorize online harassment in Twitter posts. We tested Logistic Regression, Gaussian Naïve Bayes, Decision Trees, Random Forest, Linear SVM, Gaussian SVM, Polynomial SVM, Multi-Layer Perceptron, and AdaBoost methods on the SIMAH Competition benchmark data, using TF-IDF vectors and Word2Vec embeddings as features. As results, we reached scores above 0.80% of accuracy for all the harassment types in the data. We also showed that, when using TF-IDF vectors, Linear and Gaussian SVM are the best methods to predict harassment content, while Decision Trees and Random Forest better categorize physical and sexual harassment. Overall, by using TF-IDF vectors presented higher performance on these data, suggesting that the training corpus for Word2Vec influenced negatively on the classification task outcomes.

Author supplied keywords

Cite

CITATION STYLE

APA

Saeidi, M., Samuel, S. B., Milios, E., Zeh, N., & Berton, L. (2020). Categorizing Online Harassment on Twitter. In Communications in Computer and Information Science (Vol. 1168 CCIS, pp. 283–297). Springer. https://doi.org/10.1007/978-3-030-43887-6_22

Categorizing Online Harassment on Twitter

Abstract

Author supplied keywords

Cite

Register to see more suggestions