Abstract
In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).
Cite
CITATION STYLE
Nguyen, S., Niu, G., Quinn, J., Olinsky, A., Ormsbee, J., Smith, R. M., & Bishop, J. (2019). DETECTING NON-INJURED PASSENGERS AND DRIVERS IN CAR ACCIDENTS: A NEW UNDER-RESAMPLING METHOD FOR IMBALANCED CLASSIFICATION. In Advances in Business and Management Forecasting (Vol. 13, pp. 93–105). Emerald Publishing. https://doi.org/10.1108/S1477-407020190000013011
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.