Psychosocial Features for Identifying Hate Speech in Social Media Text

  • Ombui E
  • Muchemi L
  • Wagacha P
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

This study uses natural language processing to identify hate speech in social media codeswitched text. It trains nine models and tests their predictiveness in recognizing hate speech in a 50k human-annotated dataset. The article proposes a novel hierarchical approach that leverages Latent Dirichlet Analysis to develop topic models that assist build a high-level Psychosocial feature set we call PDC. PDC organizes words into word families, which helps capture codeswitching during preprocessing for supervised learning models. Informed by the duplex theory of hate, the PDC features are based on a hate speech annotation framework. Frequency-based models employing the PDC feature on tweets from the 2012 and 2017 Kenyan presidential elections yielded an f-score of 83 percent (precision: 81 percent, recall: 85 percent) in recognizing hate speech. The study is notable because it publicly exposes a rich codeswitched dataset for comparative studies. Second, it describes how to create a novel PDC feature set to detect subtle types of hate speech hidden in codeswitched data that previous approaches could not detect.

Cite

CITATION STYLE

APA

Ombui, E., Muchemi, L., & Wagacha, P. (2021). Psychosocial Features for Identifying Hate Speech in Social Media Text. Journal of Education, Society and Behavioural Science, 32–51. https://doi.org/10.9734/jesbs/2021/v34i1230382

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free