Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Khalid Binsaeed; Gianluca Stringhini; Ahmed E. Youssef

Journal ArticleOPEN ACCESS

Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

International Journal of Advanced Computer Science and Applications (2020) 11(11) 11-22

DOI: 10.14569/IJACSA.2020.0111103

12Citations

38Readers

Abstract

Detecting Internet malicious activities has been and continues to be a critical issue that needs to be addressed effectively. This is essential to protect our personal information, computing resources, and financial capitals from unsolicited actions, such as, credential information theft, downloading and installing malware, extortion, etc. The introduction of the social media such as Twitter has given malicious users a new and a promising platform to perform their activities, ranging from a simple spam message to taking a full control over the victim’s machine. Twitter revealed that its algorithms for detecting spam are not very effective; most of the trending hashtags include unrelated spam and advertising tweets which indicates that there is a problem with the currently used spam detection framework. This paper proposes a new approach for detecting spam in Twitter microblogging using Machine Learning (ML) techniques and domain popularity services. The proposed approach comprises two main stages: 1) Tweets are collected periodically and filtered by selecting the ones that appear more frequently than a decided threshold in the specified period (i.e. common tweets). Then, an inspection is conducted on the common tweets by checking the associated URL domain with Alexa’s top one million globally viewed websites. If a tweet is common on Twitter but does not appear on the top one million globally viewed websites, it is flagged as a potential spam. 2) The second stage kicks in by running ML algorithms on the flagged tweets to extract features that help detect the cluster of spam and prevent it in real-time. The performance of the proposed approach has been evaluated using three most popular classification models (random forest, J48, and Naïve Bayes). For all classifiers, results showed the effectiveness of the proposed method in terms of different performance metrics (e.g. precision, sensitivity, F1-score, accuracy) and using different test scenarios.

Author supplied keywords

Cite

CITATION STYLE

APA

Binsaeed, K., Stringhini, G., & Youssef, A. E. (2020). Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity. International Journal of Advanced Computer Science and Applications, 11(11), 11–22. https://doi.org/10.14569/IJACSA.2020.0111103

Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Abstract

Author supplied keywords

Cite

Register to see more suggestions