Spam filtering with Naive Bayes - Which Naive Bayes?

501Citations
Citations of this article
440Readers
Mendeley users who have this article in their library.

Abstract

Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the different versions of nb over the entire tradeoff between true positives and true negatives.

Cite

CITATION STYLE

APA

Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with Naive Bayes - Which Naive Bayes? In 3rd Conference on Email and Anti-Spam - Proceedings, CEAS 2006. Conference on Email and Anti-Spam, CEAS.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free