Abstract
Ensembles of classifiers have successfully been used to improve the overall predictive accuracy in many domains. In particular, the use of boosting which focuses on hard to learn examples, have application for difficult to learn problems. In a two-class imbalanced data set, the number of examples of the majority class is much higher than that of the minority class. This implies that, during training, the predictive accuracy against the minority class of a traditional boosting ensemble may be poor. This paper introduces an approach to address this shortcoming, through the generation of synthesis examples which are added to the original training set. In this way, the ensemble is able to focus not only on hard examples, but also on rare examples. The experimental results, when applying our Databoost-IM algorithm to eleven datasets, indicate that it surpasses a benchmarking individual classifier as well as a popular boosting method, when evaluated in terms of the overall accuracy, the G-mean and the F-measures. © Springer-Verlag 2004.
Cite
CITATION STYLE
Viktor, H. L., & Guo, H. (2004). Multiple classifier prediction improvements against imbalanced datasets through added synthetic examples. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3138, 974–982. https://doi.org/10.1007/978-3-540-27868-9_107
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.