Learning to Semantically Classify Email Messages

Eric Jiang

Book Chapter

Learning to Semantically Classify Email Messages

Jiang E

Springer Science and Business Media Deutschland GmbH, (2006), 700-711

DOI: 10.1007/978-3-540-37256-1_86

0Citations

8Readers

Get full text

Abstract

As a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naïve Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, E. (2006). Learning to Semantically Classify Email Messages. In Lecture Notes in Control and Information Sciences (Vol. 344, pp. 700–711). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-540-37256-1_86

Learning to Semantically Classify Email Messages

Abstract

Author supplied keywords

Cite

Register to see more suggestions