Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques

Fahad Rasheed; Mehmoon Anwar; Imran Khan

Journal ArticleOPEN ACCESS

Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques

Rasheed F
Anwar M
Khan I

Pakistan Journal of Engineering and Technology (2022) 5(2) 198-203

DOI: 10.51846/vol5iss2pp198-203

N/ACitations

10Readers

Abstract

Nowadays, social media platforms are the primary source of public communication and information. Social media platforms have become an integral part of our daily lives, and their user base is rapidly expanding as access is extended to more remote locations. Pakistan has around 71.70 million social media users that utilize Roman Urdu to communicate. With these improvements and the increasing number of users, there has been an increase in digital bullying, often known as cyberbullying. This research focuses on social media users who use Roman Urdu (Urdu language written in the English alphabet) to communicate. In this research, we explored the topic of cyberbullying actions on the Twitter platform, where users employ Roman Urdu as a medium of communication. To our knowledge, this is one of the very few studies that address cyberbullying behavior in Roman Urdu. Our proposed study aims to identify a suitable model for classifying cyberbullying behavior in Roman Urdu. To begin, the dataset was designed by extracting data from twitter using twitter's API. The targeted data was extracted using keywords based on Roman Urdu. The data was then annotated as bully and not-bully. After that, the dataset has been pre-processed to reduce noise, which includes punctuation, stop words, null entries, and duplication removal. Following that, features are extracted using two different methods, Count-Vectorizer and TF-IDF Vectorizer, and a set of ten different learning algorithms including SVM, MLP, and KNN was applied to both types of extracted features based on supervised learning. Support Vector Machine (SVM) performed the best out of the implemented algorithms by both combinations, with 97.8 percent when implemented over the TF-IDF features and 93.4 percent when implemented over the CV features. The proposed mechanism could be helpful for online social apps and chat rooms for the better detection and designing of bully word filters, making safer cyberspace for end users.

Cite

CITATION STYLE

APA

Rasheed, F., Anwar, M., & Khan, I. (2022). Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques. Pakistan Journal of Engineering and Technology, 5(2), 198–203. https://doi.org/10.51846/vol5iss2pp198-203

Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques

Abstract

Cite

Register to see more suggestions