Policy-Based Spam Detection of Tweets Dataset

Momna Dar; Faiza Iqbal; Rabia Latif; Ayesha Altaf; Nor Shahida Mohd Jamail

Journal ArticleOPEN ACCESS

Policy-Based Spam Detection of Tweets Dataset

Electronics (Switzerland) (2023) 12(12)

DOI: 10.3390/electronics12122662

5Citations

17Readers

Abstract

Spam communications from spam ads and social media platforms such as Facebook, Twitter, and Instagram are increasing, making spam detection more popular. Many languages are used for spam review identification, including Chinese, Urdu, Roman Urdu, English, Turkish, etc.; however, there are fewer high-quality datasets available for Urdu. This is mainly because Urdu is less extensively used on social media networks such as Twitter, making it harder to collect huge volumes of relevant data. This paper investigates policy-based Urdu tweet spam detection. This study aims to collect over 1,100,000 real-time tweets from multiple users. The dataset is carefully filtered to comply with Twitter’s 100-tweet-per-hour limit. For data collection, the snscrape library is utilized, which is equipped with an API for accessing various attributes such as username, URL, and tweet content. Then, a machine learning pipeline consisting of TF-IDF, Count Vectorizer, and the following machine learning classifiers: multinomial naïve Bayes, support vector classifier RBF, logical regression, and BERT, are developed. Based on Twitter policy standards, feature extraction is performed, and the dataset is separated into training and testing sets for spam analysis. Experimental results show that the logistic regression classifier has achieved the highest accuracy, with an F1-score of 0.70 and an accuracy of 99.55%. The findings of the study show the effectiveness of policy-based spam detection in Urdu tweets using machine learning and BERT layer models and contribute to the development of a robust Urdu language social media spam detection method.

Author supplied keywords

Cite

CITATION STYLE

APA

Dar, M., Iqbal, F., Latif, R., Altaf, A., & Jamail, N. S. M. (2023). Policy-Based Spam Detection of Tweets Dataset. Electronics (Switzerland), 12(12). https://doi.org/10.3390/electronics12122662

Policy-Based Spam Detection of Tweets Dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions