Policy-Based Spam Detection of Tweets Dataset

5Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Spam communications from spam ads and social media platforms such as Facebook, Twitter, and Instagram are increasing, making spam detection more popular. Many languages are used for spam review identification, including Chinese, Urdu, Roman Urdu, English, Turkish, etc.; however, there are fewer high-quality datasets available for Urdu. This is mainly because Urdu is less extensively used on social media networks such as Twitter, making it harder to collect huge volumes of relevant data. This paper investigates policy-based Urdu tweet spam detection. This study aims to collect over 1,100,000 real-time tweets from multiple users. The dataset is carefully filtered to comply with Twitter’s 100-tweet-per-hour limit. For data collection, the snscrape library is utilized, which is equipped with an API for accessing various attributes such as username, URL, and tweet content. Then, a machine learning pipeline consisting of TF-IDF, Count Vectorizer, and the following machine learning classifiers: multinomial naïve Bayes, support vector classifier RBF, logical regression, and BERT, are developed. Based on Twitter policy standards, feature extraction is performed, and the dataset is separated into training and testing sets for spam analysis. Experimental results show that the logistic regression classifier has achieved the highest accuracy, with an F1-score of 0.70 and an accuracy of 99.55%. The findings of the study show the effectiveness of policy-based spam detection in Urdu tweets using machine learning and BERT layer models and contribute to the development of a robust Urdu language social media spam detection method.

Cite

CITATION STYLE

APA

Dar, M., Iqbal, F., Latif, R., Altaf, A., & Jamail, N. S. M. (2023). Policy-Based Spam Detection of Tweets Dataset. Electronics (Switzerland), 12(12). https://doi.org/10.3390/electronics12122662

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free