Hate speech detection on twitter: Feature engineering v.s. feature selection

David Robinson; Ziqi Zhang; Jonathan Tepper

Conference ProceedingsOPEN ACCESS

Hate speech detection on twitter: Feature engineering v.s. feature selection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11155 LNCS 46-49

DOI: 10.1007/978-3-319-98192-5_9

36Citations

89Readers

Abstract

The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such methods. We conduct a feature selection analysis in such a task using Twitter as a case study, and show findings that challenge conventional perception of the importance of manual feature engineering: automatic feature selection can drastically reduce the carefully engineered features by over 90% and selects predominantly generic features often used by many other language related tasks; nevertheless, the resulting models perform better using automatically selected features than carefully crafted task-specific features.

Cite

CITATION STYLE

APA

Robinson, D., Zhang, Z., & Tepper, J. (2018). Hate speech detection on twitter: Feature engineering v.s. feature selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11155 LNCS, pp. 46–49). Springer Verlag. https://doi.org/10.1007/978-3-319-98192-5_9

Hate speech detection on twitter: Feature engineering v.s. feature selection

Abstract

Cite

Register to see more suggestions