Combining supervised and semi-supervised classifier for personalized spam filtering

Victor Cheng; Chun Hung Li

Conference Proceedings

Combining supervised and semi-supervised classifier for personalized spam filtering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4426 LNAI 449-456

DOI: 10.1007/978-3-540-71701-0_45

7Citations

8Readers

Get full text

Abstract

This paper addresses the problem of spam filtering for individual email user under the condition that only public domain labeled emails given as the training data and all emails from the user's email inbox are unlabeled. Owing to the difference of wordings and distribution of emails, conventional supervised classifier such as SVM cannot produce accurate result because it assumes the training and the testing data come from the same source and have the same distribution. We model these discrepancies as variation of decision hyperplane and come up with a criterion for selecting reliable emails with classified labels which are likely to be agreed by the user. A semi-supervised classifier then uses these emails as the training set and propagates the label information to other unlabeled emails by exploiting the distribution of them in feature space. Experimental result shows that this combined classifier strategy can classify emails for individual user with high accuracy. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Cheng, V., & Li, C. H. (2007). Combining supervised and semi-supervised classifier for personalized spam filtering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4426 LNAI, pp. 449–456). Springer Verlag. https://doi.org/10.1007/978-3-540-71701-0_45

Combining supervised and semi-supervised classifier for personalized spam filtering

Abstract

Cite

Register to see more suggestions