Roughly balanced bagging for imbalanced data

30Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distributions. In our new sampling method "Roughly Balanced Bagging" (RB Bagging), the number of samples in the largest and smallest classes are different, but they are effectively balanced when averaged over all subsets, which supports the approach of bagging in a more appropriate way. Our method is different from the existing bagging methods for imbalanced data which draw exactly the same numbers of majority and minority examples for the sampled subset data. In addition, our method makes full use of all of the minority examples by under-sampling, which is efficiently done by using negative binomial distributions. RB Bagging outperforms the existing "balanced" methods and other common methods, as shown by the experiments using benchmark and real-world data sets. Copyright © by SIAM.

References Powered by Scopus

Random forests

96706Citations
N/AReaders
Get full text

SMOTE: Synthetic minority over-sampling technique

22940Citations
N/AReaders
Get full text

Bagging predictors

19199Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Cost-sensitive learning methods for imbalanced data

286Citations
N/AReaders
Get full text

Comparing boosting and bagging techniques with noisy and imbalanced data

272Citations
N/AReaders
Get full text

Types of minority class examples and their influence on learning classifiers from imbalanced data

243Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hido, S., & Kashima, H. (2008). Roughly balanced bagging for imbalanced data. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130 (Vol. 1, pp. 143–152). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611972788.13

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

71%

Lecturer / Post doc 1

14%

Researcher 1

14%

Readers' Discipline

Tooltip

Computer Science 6

86%

Economics, Econometrics and Finance 1

14%

Save time finding and organizing research with Mendeley

Sign up for free