Smoothing multinomial naïve Bayes in the presence of imbalance

N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multinomial naïve Bayes is a popular classifier used for a wide variety of applications. When applied to text classification, this classifier requires some form of smoothing when estimating parameters. Typically, Laplace smoothing is used, and researchers have proposed several other successful forms of smoothing. In this paper, we show that common preprocessing techniques for text categorization have detrimental effects when using several of these well-known smoothing methods. We also introduce a new form of smoothing for which these detrimental effects are less severe: ROSE smoothing, which can be derived from methods for cost-sensitive learning and imbalanced datasets. We show empirically on text data that ROSE smoothing performs well compared to known methods of smoothing, and is the only method tested that performs well regardless of the type of text preprocessing used. It is particularly effective compared to existing methods when the data is imbalanced. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Liu, A. Y., & Martin, C. E. (2011). Smoothing multinomial naïve Bayes in the presence of imbalance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6871 LNAI, pp. 46–59). https://doi.org/10.1007/978-3-642-23199-5_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free