Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

V. García; J. S. Sánchez; R. Martín-Félez; R. A. Mollineda

Journal ArticleOPEN ACCESS

Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

Progress in Artificial Intelligence (2012) 1(4) 347-362

DOI: 10.1007/s13748-012-0027-5

56Citations

61Readers

Abstract

Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strategies to tackle this problem consists of balancing the classes by resampling the original data set. The SMOTE algorithm is probably the most popular technique to increase the size of the minority class by generating synthetic instances. From the idea of the original SMOTE, we here propose the use of three approaches to surrounding neighborhood with the aim of generating artificial minority instances, but taking into account both the proximity and the spatial distribution of the examples. Experiments over a large collection of databases and using three different classifiers demonstrate that the new surrounding neighborhood-based SMOTE procedures significantly outperform other existing over-sampling algorithms. © 2012 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

García, V., Sánchez, J. S., Martín-Félez, R., & Mollineda, R. A. (2012). Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Progress in Artificial Intelligence, 1(4), 347–362. https://doi.org/10.1007/s13748-012-0027-5

Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

Abstract

Author supplied keywords

Cite

Register to see more suggestions