Imbalanced data classification is one of the hot topics in data mining and machine learning in recent years. In practice, Imbalanced data classification is very common, such as cancer detection, spam discrimination, credit card fraud detection, etc. Because of the large difference in the number of categories and Imbalanced distribution, traditional classification algorithms have poor classification effect on minority classes, and correct identification of minority classes often brings greater value. Therefore, how to effectively identify minority classes in Imbalanced data is of great importance. Practical significance. Aiming at the problems that the Bagging-based Imbalanced data classification method cannot guarantee the validity and existence of classification boundaries by adding redundant noise information and sampling, an ensemble learning GABagging method based on sample combination optimization is proposed. Firstly, the sample combination optimization algorithm uses genetic algorithm to select a subset from most classes and construct a new data set with a few classes. Subsequently, several sample combinatorial optimization algorithms are used to train and integrate several classifiers. The experimental results show that GABagging can improve the correct recognition ability of minority classes on 19 Imbalanced datasets compared with other similar methods such as TPR and AUC, without excessive loss of recognition ability of majority classes. It is proved that GABagging can compensate for the shortcomings of related Bagging-based methods such as easy loss, increasing samples and not guaranteeing the validity and existence of classification boundaries after sampling.
CITATION STYLE
Wang, Y. (2019). An Ensemble Learning Imbalanced Data Classification Method Based on Sample Combination Optimization. In Journal of Physics: Conference Series (Vol. 1284). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1284/1/012035
Mendeley helps you to discover research relevant for your work.