Classification Spam Email with Elimination of Unsuitable Features with Hybrid of GA-Naive Bayes

15Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.

Abstract

Email spam is a security problem that involves different techniques in machine learning to solve this problem. The rise of this security issue makes organisation email service unreliable and has a direct relation with vulnerability of clients through unexpected spam mails, like ransomware. There are several methods to identifying spam emails. Most of these methods focused on feature selection; however, these models decreased the accuracy of the detection. This paper proposed a novel spam detection method that is not only to decrease the accuracy, but eliminates unsuitable features with less processing. The features are in the terms of contents, and the number of features is very big, so it can decrease the memory complexity. We use Hewlett-Packet (HP) laboratory samples text emails. First, GA algorithm is employed to select features without limited number of feature selection with the aid of Bayesian theory as a fitness function and checked with a different number of repetitions. The result improved with GA by increasing number of repetitions, and tested with distinctive selection method, Random selection and Tournament selection. In the second stage, the dataset classifies emails as Spam or Ham by Naive Bayes. The results show that Naive Bayes and hybrid GA-Naive Bayes are almost identical, but GA-Naive Bayes has a better performance.

Cite

CITATION STYLE

APA

Ebadati, O. M. E., & Ahmadzadeh, F. (2019). Classification Spam Email with Elimination of Unsuitable Features with Hybrid of GA-Naive Bayes. Journal of Information and Knowledge Management, 18(1). https://doi.org/10.1142/S0219649219500084

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free