Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments

Koyel Ghosh; Apurbalal Senapati

Journal ArticleOPEN ACCESS

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments

Natural Language Processing (2025) 31(2) 393-414

DOI: 10.1017/nlp.2024.28

15Citations

44Readers

Abstract

Cyberbullying, online harassment, etc., via offensive comments are pervasive across different social media platforms likeTMTwitter,TMFacebook,TMYouTube, etc. Hateful comments must be detected and eradicated to prevent harassment and violence on social media. In the Natural Language Processing (NLP) domain, the most prevalent task is comment classification, which is challenging, and language models based on transformers are at the forefront of this advancement. This paper intends to analyze the performance of language models based on transformers like BERT, ALBERT, RoBERTa, and DistilBERT on the Indian hate speech datasets over binary classification. Here, we utilize the existing datasets, i.e., HASOC (Hindi and Marathi) and HS-Bangla. So, we evaluate several multilingual language models like MuRIL-BERT, XLM-RoBERTa, etc., few monolingual language models like RoBERTa-Hindi, Maha-BERT (Marathi), Bangla-BERT (Bangla), Assamese-BERT (Assamese), etc., and perform cross-lingual experiment also. For further analyses, we perform multilingual, monolingual, and cross-lingual experiments on our Hate Speech Assamese (HS-Assamese) (Indo-Aryan language family) and Hate Speech Bodo (HS-Bodo) (Sino-Tibetan language family) dataset (HS dataset version 2) also and achieved a promising result. The motivation of the cross-lingual experiment is to encourage researchers to learn about the power of the transformer. Note that no pre-trained language models are currently available for Bodo or any other Sino-Tibetan languages.

Author supplied keywords

Cite

CITATION STYLE

APA

Ghosh, K., & Senapati, A. (2025). Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments. Natural Language Processing, 31(2), 393–414. https://doi.org/10.1017/nlp.2024.28

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments

Abstract

Author supplied keywords

Cite

Register to see more suggestions