Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text

Khubaib Ahmed Qureshi; Muhammad Sabih

Journal ArticleOPEN ACCESS

Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text

IEEE Access (2021) 9 109465-109477

DOI: 10.1109/ACCESS.2021.3101977

44Citations

116Readers

Abstract

There is an enormous growth of social media which fully promotes freedom of expression through its anonymity feature. Freedom of expression is a human right but hate speech towards a person or group based on race, caste, religion, ethnic or national origin, sex, disability, gender identity, etc. is an abuse of this sovereignty. It seriously promotes violence or hate crimes and creates an imbalance in society by damaging peace, credibility, and human rights, etc. Detecting hate speech in social media discourse is quite essential but a complex task. There are different challenges related to appropriate and social media-specific dataset availability and its high-performing supervised classifier for text-based hate speech detection. These issues are addressed in this study, which includes the availability of social media-specific broad and balanced dataset, with multi-class labels and its respective automatic classifier, a dataset with language subtleties, dataset labeled under a comprehensive definition and well-defined rules, dataset labeled with the strong agreement of annotators, etc. Addressing different categories of hate separately, this paper aims to accurately predict their different forms, by exploring a group of text mining features. Two distinct groups of features are explored for problem suitability. These are baseline features and self-discovered/new features. Baseline features include the most commonly used effective features of related studies. Exploration found a few of them, like character and word n-grams, dependency tuples, sentiment scores, and count of 1st, 2nd person pronouns are more efficient than others. Due to the application of latent semantic analysis (LSA) for dimensionality reduction, this problem is benefited from the utilization of many complex and non-linear models and CAT Boost performed best. The proposed model is compared with related studies in addition to system baseline models. The results produced by the proposed model were much appreciating.

Author supplied keywords

Cite

CITATION STYLE

APA

Qureshi, K. A., & Sabih, M. (2021). Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text. IEEE Access, 9, 109465–109477. https://doi.org/10.1109/ACCESS.2021.3101977

Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text

Abstract

Author supplied keywords

Cite

Register to see more suggestions