A heuristic-based feature selection method for clustering spam emails

Jungsuk Song; Masashi Eto; Hyung Chan Kim; Daisuke Inoue; Koji Nakao

Conference Proceedings

A heuristic-based feature selection method for clustering spam emails

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6443 LNCS(PART 1) 290-297

DOI: 10.1007/978-3-642-17537-4_36

N/ACitations

5Readers

Get full text

Abstract

In recent years, in order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails. During the clustering process, many statistical features (e.g., the size of emails) are used for calculating similarities between spam emails. In many cases, however, some of the features may be redundant or contribute little to the clustering process. Feature selection is one of the most typical methods used to identify a subset of key features from an initial set. In this paper, we propose a heuristic-based feature selection method for clustering spam emails. Unlike the existing methods in that they make the combinations of given features and evaluate them using data mining and machine learning techniques, our method focuses on evaluating each feature according to only its value distribution in spam clusters. With our method, we identified 4 significant features which yielded a clustering accuracy of 86.33% with low time complexity. © 2010 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, J., Eto, M., Kim, H. C., Inoue, D., & Nakao, K. (2010). A heuristic-based feature selection method for clustering spam emails. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6443 LNCS, pp. 290–297). https://doi.org/10.1007/978-3-642-17537-4_36

A heuristic-based feature selection method for clustering spam emails

Abstract

Author supplied keywords

Cite

Register to see more suggestions