Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data

12Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Fitting label-imbalanced data with high level of noise is one of the major challenges in learning-based intelligent system design. In this paper, for the two-class problem, we propose a bagging-based algorithm with Xgboost classifier (Gradient Boosting Machine) and under-sampling approaches to overcome the challenge. To avoid model misspecification caused by imbalanced data, random sampling with replacement is employed to obtain several balanced training sets; and to mitigate the problem of misleading information produced by noise, Tomek Link method is introduced to eliminate the cross-class overlapped instances, which are the primal sources of noise. And to obtain robust individual learners, we utilize Xgboost, a novel Gradient Boosting Machine-based classifier with convenient parameter tuning interface, to fit each component of the bagging ensemble. The performance of the proposed method is tested with Mandarin radio records (MFCC features) with the task of keywords recognition, and experimental results show that the new method could outperform single Xgboost classifier, verified the rationality and effectiveness of the bagging scheme. The method proposed in the paper could offer a novel solution to the challenge of noisy imbalanced data classification, and the implementation of Xgboost in this area could also serve as an innovative work.

Cite

CITATION STYLE

APA

Ruisen, L., Songyi, D., Chen, W., Peng, C., Zuodong, T., Yanmei, Y., & Shixiong, W. (2018). Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data. In IOP Conference Series: Materials Science and Engineering (Vol. 428). Institute of Physics Publishing. https://doi.org/10.1088/1757-899X/428/1/012004

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free