Abstract
This paper describes the system developed at the University of Alicante (UA) for the SemEval 2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. The purpose of this work is to build a strong baseline for hate speech detection by means of a traditional machine learning approach with standard textual features, which could serve as a reference to compare with deep learning systems. We participated in both task A (Hate Speech Detection against Immigrants and Women) and task B (Aggressive behavior and Target Classification) for both English and Spanish. Given the text of a tweet, task A consists of detecting hate speech against women or immigrants in the text, whereas task B consists of identifying the target harassed as individual or generic, and to classify hateful tweets as aggressive or not aggressive. Despite its simplicity, our system obtained a remarkable macro-F1 score of 72.5 (sixth highest) and an accuracy of 73.6 (second highest) in Spanish (task A), outperforming more complex neural models from a total of 40 participant systems.
Cite
CITATION STYLE
Perelló, C., Tomás, D., Garcia-Garcia, A., Garcia-Rodriguez, J., & Camacho-Collados, J. (2019). UA at SemEval-2019 task 5: Setting a strong linear baseline for hate speech detection. In NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop (pp. 508–513). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s19-2091
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.