As a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naïve Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.
CITATION STYLE
Jiang, E. (2006). Learning to Semantically Classify Email Messages. In Lecture Notes in Control and Information Sciences (Vol. 344, pp. 700–711). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-540-37256-1_86
Mendeley helps you to discover research relevant for your work.