Smoothing multinomial naïve Bayes in the presence of imbalance

Alexander Y. Liu; Cheryl E. Martin

Conference Proceedings

Smoothing multinomial naïve Bayes in the presence of imbalance

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6871 LNAI 46-59

DOI: 10.1007/978-3-642-23199-5_4

N/ACitations

9Readers

Get full text

Abstract

Multinomial naïve Bayes is a popular classifier used for a wide variety of applications. When applied to text classification, this classifier requires some form of smoothing when estimating parameters. Typically, Laplace smoothing is used, and researchers have proposed several other successful forms of smoothing. In this paper, we show that common preprocessing techniques for text categorization have detrimental effects when using several of these well-known smoothing methods. We also introduce a new form of smoothing for which these detrimental effects are less severe: ROSE smoothing, which can be derived from methods for cost-sensitive learning and imbalanced datasets. We show empirically on text data that ROSE smoothing performs well compared to known methods of smoothing, and is the only method tested that performs well regardless of the type of text preprocessing used. It is particularly effective compared to existing methods when the data is imbalanced. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, A. Y., & Martin, C. E. (2011). Smoothing multinomial naïve Bayes in the presence of imbalance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6871 LNAI, pp. 46–59). https://doi.org/10.1007/978-3-642-23199-5_4

Smoothing multinomial naïve Bayes in the presence of imbalance

Abstract

Author supplied keywords

Cite

Register to see more suggestions