An ensemble learning approach for addressing the class imbalance problem in twitter spam detection

11Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

Cite

CITATION STYLE

APA

Liu, S., Wang, Y., Chen, C., & Xiang, Y. (2016). An ensemble learning approach for addressing the class imbalance problem in twitter spam detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9722, pp. 215–228). Springer Verlag. https://doi.org/10.1007/978-3-319-40253-6_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free