Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus. © Springer-Verlag Berlin Heidelberg 2004.
CITATION STYLE
Siefkes, C., Assis, F., Chhabra, S., & Yerazunis, W. S. (2004). Combining winnow and orthogonal sparse bigrams for incremental spam filtering. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3202, 410–421. https://doi.org/10.1007/978-3-540-30116-5_38
Mendeley helps you to discover research relevant for your work.