Improved Naive Bayes with optimal correlation factor for text classification

9Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Naive Bayes (NB) estimator is widely-used in text classification problems. However, it does not perform well with small-size training datasets. Most previous literature focuses on either creating and modifying features or combing clustering to improve the performance of NB. We directly tackle the problem by constructing a new estimator, called Naive Bayes with correlation factor. We introduce a correlation factor to NB estimator that incorporates overall correlation among the different classes. This effectively exploits the idea of bootstrapping, which reuses data for all classes even if they only belong to one class. Moreover, we obtain a formula for the optimal correlation factor by balancing bias and variance of the estimator. Experimental results on real-world data show that our estimator achieves better accuracy compared with traditional Naive Bayes, yet at the same time maintaining the simplicity of NB.

Cite

CITATION STYLE

APA

Chen, J., Dai, Z., Duan, J., Matzinger, H., & Popescu, I. (2019). Improved Naive Bayes with optimal correlation factor for text classification. SN Applied Sciences, 1(9). https://doi.org/10.1007/s42452-019-1153-5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free