Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns

Mina Sheikhalishahi; Andrea Saracino; Mohamed Mejri; Nadia Tawbi; Fabio Martinelli

Conference Proceedings

Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9331 3-19

DOI: 10.1007/978-3-319-24858-5_1

N/ACitations

10Readers

Get full text

Abstract

Fast analysis of correlated spam emails may be vital in the effort of finding and prosecuting spammers performing cybercrimes such as phishing and online frauds. This paper presents a self-learning framework to automatically divide and classify large amounts of spam emails in correlated labeled groups. Building on large datasets daily collected through honeypots, the emails are firstly divided into homogeneous groups of similar messages (campaigns), which can be related to a specific spammer. Each campaign is then associated to a class which specifies the goal of the spammer, i.e. phishing, advertisement, etc. The proposed framework exploits a categorical clustering algorithm to group similar emails, and a classifier to subsequently label each email group. The main advantage of the proposed framework is that it can be used on large spam emails datasets, for which no prior knowledge is provided. The approach has been tested on more than 3200 real and recent spam emails, divided in more than 60 campaigns, reporting a classification accuracy of 97% on the classified data accuracy of 97% on the classified data.

Cite

CITATION STYLE

APA

Sheikhalishahi, M., Saracino, A., Mejri, M., Tawbi, N., & Martinelli, F. (2015). Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9331, pp. 3–19). Springer Verlag. https://doi.org/10.1007/978-3-319-24858-5_1

Digital waste sorting: A goal-based, self-learning approach to label spam email campaigns

Abstract

Cite

Register to see more suggestions