Knowledge Discovery in Databases

Frank Beekmann

Book Chapter

Knowledge Discovery in Databases

Beekmann F

Deutscher Universitätsverlag, (2003), 5-50

DOI: 10.1007/978-3-322-81227-8_2

N/ACitations

55Readers

Get full text

Abstract

Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning from imbalanced data sets, based on a combination of the SMOTE algorithm and the boosting procedure. Unlike standard boosting where all misclassified examples are given equal weights, SMOTEBoost creates synthetic examples from the rare or minority class, thus indirectly changing the updating weights and compensating for skewed distributions. SMOTEBoost applied to several highly and moderately imbalanced data sets shows improvement in prediction performance on the minority class and overall improved F-values.

Cite

CITATION STYLE

APA

Beekmann, F. (2003). Knowledge Discovery in Databases. In Stichprobenbasierte Assoziationsanalyse im Rahmen des Knowledge Discovery in Databases (pp. 5–50). Deutscher Universitätsverlag. https://doi.org/10.1007/978-3-322-81227-8_2

Knowledge Discovery in Databases

Abstract

Cite

Register to see more suggestions