Combining winnow and orthogonal sparse bigrams for incremental spam filtering

50Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Siefkes, C., Assis, F., Chhabra, S., & Yerazunis, W. S. (2004). Combining winnow and orthogonal sparse bigrams for incremental spam filtering. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3202, 410–421. https://doi.org/10.1007/978-3-540-30116-5_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free