Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns

N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Fast analysis of correlated spam emails may be vital in the effort of finding and prosecuting spammers performing cybercrimes such as phishing and online frauds. This paper presents a self-learning framework to automatically divide and classify large amounts of spam emails in correlated labeled groups. Building on large datasets daily collected through honeypots, the emails are firstly divided into homogeneous groups of similar messages (campaigns), which can be related to a specific spammer. Each campaign is then associated to a class which specifies the goal of the spammer, i.e. phishing, advertisement, etc. The proposed framework exploits a categorical clustering algorithm to group similar emails, and a classifier to subsequently label each email group. The main advantage of the proposed framework is that it can be used on large spam emails datasets, for which no prior knowledge is provided. The approach has been tested on more than 3200 real and recent spam emails, divided in more than 60 campaigns, reporting a classification accuracy of 97% on the classified data accuracy of 97% on the classified data.

Cite

CITATION STYLE

APA

Sheikhalishahi, M., Saracino, A., Mejri, M., Tawbi, N., & Martinelli, F. (2015). Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9331, pp. 3–19). Springer Verlag. https://doi.org/10.1007/978-3-319-24858-5_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free