Detecting Offensive Tweets in Hindi-English Code-Switched Language

104Citations
Citations of this article
143Readers
Mendeley users who have this article in their library.

Abstract

The exponential rise of social media websites like Twitter, Facebook and Reddit in linguistically diverse geographical regions has led to hybridization of popular native languages with English in an effort to ease communication. The paper focuses on the classification of offensive tweets written in Hinglish language, which is a portmanteau of the Indic language Hindi with the Roman script. The paper introduces a novel tweet dataset, titled Hindi-English Offensive Tweet (HEOT) dataset, consisting of tweets in Hindi-English code switched language split into three classes: non-offensive, abusive and hate-speech. Further, we approach the problem of classification of the tweets in HEOT dataset using transfer learning wherein the proposed model employing Convolutional Neural Networks is pre-trained on tweets in English followed by retraining on Hinglish tweets.

Cite

CITATION STYLE

APA

Mathur, P., Shah, R. R., Sawhney, R., & Mahata, D. (2018). Detecting Offensive Tweets in Hindi-English Code-Switched Language. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 18–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-3504

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free