We explain the phenomenon that the naive Bayesian classifier may dominate the proper one as happened in clinical studies, cf. Gammerman and Thatcher (Methods of Information in Medicine, 30, 15-22, 1991). Today this effect may be of concern for real-time health care monitoring or surveillance. The reason for the dominance relation lies in a mix of an a-priori not fixed dimension of the state-space (symptom space) given a disease, the feature selection procedure and the parameter estimation. Estimating conditional probabilities in high dimensions when using a proper Bayesian model can lead to an "over fitting," a missing value problem, and, consequently, to a loss of classification accuracy. Due to the "Curse of dimension" the degradation may not even be compensated by big data sets.
CITATION STYLE
Lenz, H. J. (2015). Why the naive bayesian classifier for clinical diagnostics or monitoring can dominate the proper one even for massive data sets. In Frontiers in Statistical Quality Control 10 (Vol. 11, pp. 385–393). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-319-12355-4_23
Mendeley helps you to discover research relevant for your work.