The email inbox is indeed a dangerous place, but using pattern recognition tools it is possible to filter most wasteful elements that may cause damage to end users. Furthermore, as phishing and spam strategies have shown an adversarial and dynamic behavior, the number of variables to be considered for a proper email classification has increased substantially over time. For many years these elements have driven pattern recognition and machine learning communities to keep improving email filtering techniques. This work presents an embedded feature selection approach that determines a non-linear decision boundary with minimal error and a reduced number of features by penalizing their use in the dual formulation of binary Support Vector Machines (SVMs). The proposed method optimizes the width of an anisotropic RBF Kernel via successive gradient descent steps, eliminating those features that have low relevance for the model. Experiments with two real-world spam and phishing data sets demonstrate that our approach has a better performance than well-known feature selection algorithms while consistently using a smaller number of variables.
CITATION STYLE
Maldonado, S., & L’Huillier, G. (2013). SVM-Based Feature Selection and Classification for Email Filtering (pp. 135–148). https://doi.org/10.1007/978-3-642-36530-0_11
Mendeley helps you to discover research relevant for your work.