Discovering minority sub-clusters and local difficulty factors from imbalanced data

Mateusz Lango; Dariusz Brzezinski; Sebastian Firlik; Jerzy Stefanowski

Conference Proceedings

Discovering minority sub-clusters and local difficulty factors from imbalanced data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10558 LNAI 324-339

DOI: 10.1007/978-3-319-67786-6_23

5Citations

5Readers

Get full text

Abstract

Learning classifiers from imbalanced data is particularly challenging when class imbalance is accompanied by local data difficulty factors, such as outliers, rare cases, class overlapping, or minority class decomposition. Although these issues have been highlighted in previous research, there have been no proposals of algorithms that simultaneously detect all the aforementioned difficulties in a dataset. In this paper, we put forward two extensions to popular clustering algorithms, ImKmeans and ImScan, and one novel algorithm, ImGrid, that attempt to detect minority sub-clusters, outliers, rare cases, and class overlapping. Experiments with artificial datasets show that ImGrid, which uses a Bayesian test to join similar neighboring regions, is able to re-discover simulated clusters and types of minority examples on par with competing methods, while being the least sensitive to parameter tuning.

Author supplied keywords

Cite

CITATION STYLE

APA

Lango, M., Brzezinski, D., Firlik, S., & Stefanowski, J. (2017). Discovering minority sub-clusters and local difficulty factors from imbalanced data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10558 LNAI, pp. 324–339). Springer Verlag. https://doi.org/10.1007/978-3-319-67786-6_23

Discovering minority sub-clusters and local difficulty factors from imbalanced data

Abstract

Author supplied keywords

Cite

Register to see more suggestions