Spam filtering with Naive Bayes - Which Naive Bayes?

Vangelis Metsis; Ion Androutsopoulos; Georgios Paliouras

Conference Proceedings

Spam filtering with Naive Bayes - Which Naive Bayes?

3rd Conference on Email and Anti-Spam - Proceedings, CEAS 2006 (2006)

501Citations

440Readers

Abstract

Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the different versions of nb over the entire tradeoff between true positives and true negatives.

Cite

CITATION STYLE

APA

Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with Naive Bayes - Which Naive Bayes? In 3rd Conference on Email and Anti-Spam - Proceedings, CEAS 2006. Conference on Email and Anti-Spam, CEAS.

Spam filtering with Naive Bayes - Which Naive Bayes?

Abstract

Cite

Register to see more suggestions