PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data

Yong Seok Jeon; Dong Joon Lim

Journal ArticleOPEN ACCESS

PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data

IEEE Access (2020) 8 131920-131927

DOI: 10.1109/ACCESS.2020.3009753

18Citations

41Readers

Abstract

Imbalanced classes are a common problem in machine learning, and the computational costs required for proper resampling increases with the data size. In this study, a simple and effective undersampling method, named particle stacking undersampling (PSU) was proposed. Compared with other competing undersampling methods, PSU can significantly reduce the computational costs, while minimizing information loss to prevent a prediction bias. The performance benchmark applied on 55 binary classification problems indicated that the proposed method not only achieved an enhanced classification performance over other well-known undersampling methods (random undersampling, NearMiss-1, NearMiss-2, cluster centroid, edited nearest neighbor, condensed nearest neighbor, and Tomek Links) but also provided a computational simplicity that can be scalable to large data. Moreover, an experiment verified that two propositions forming the basis of the PSU algorithm can also be applied to other undersampling methods to achieve methodological improvements.

Author supplied keywords

Cite

CITATION STYLE

APA

Jeon, Y. S., & Lim, D. J. (2020). PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data. IEEE Access, 8, 131920–131927. https://doi.org/10.1109/ACCESS.2020.3009753

PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions