On word frequency information and negative evidence in naive bayes text classification

Karl Michael Schneider

Conference Proceedings

On word frequency information and negative evidence in naive bayes text classification

Schneider K

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3230 474-485

DOI: 10.1007/978-3-540-30228-5_42

38Citations

38Readers

Get full text

Abstract

The Naive Bayes classifier exists in different versions. One version, called multi-variate Bernoulli or binary independence model, uses binary word occurrence vectors, while the multinomial model uses word frequency counts. Many publications cite this difference as the main reason for the superior performance of the multinomial Naive Bayes classifier. We argue that this is not true. We show that when all word frequency information is eliminated from the document vectors, the multinomial Naive Bayes model performs even better. Moreover, we argue that the main reason for the difference in performance is the way that negative evidence, i.e. evidence from words that do not occur in a document, is incorporated in the model. Therefore, this paper aims at a better understanding and a clarification of the difference between the two probabilistic models of Naive Bayes.

Cite

CITATION STYLE

APA

Schneider, K. M. (2004). On word frequency information and negative evidence in naive bayes text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3230, pp. 474–485). Springer Verlag. https://doi.org/10.1007/978-3-540-30228-5_42

On word frequency information and negative evidence in naive bayes text classification

Abstract

Cite

Register to see more suggestions