The performance of traditional classifier skews towards the majority class for imbalanced data, resulting in high misclassification rate for minority samples. To solve this problem, a combination classification algorithm based on outlier detection and C4.5 is presented. The basic idea of the algorithm is to make the data distribution balance by grouping the whole data into rare clusters and major clusters through the outlier factor. Then C4.5 algorithm is implemented to build the decision trees on both the rare clusters and the major clusters respectively. When classifying a new object, the decision tree for evaluation will be chosen according to the type of the cluster which the new object is nearest. We use the datasets from the UCI Machine Learning Repository to perform the experiments and compare the effects with other classification algorithms; the experiments demonstrate that our algorithm performs much better for the extremely imbalanced data sets. © 2009 Springer.
CITATION STYLE
Jiang, S. Y., & Yu, W. (2009). A combination classification algorithm based on outlier detection and C4.5. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5678 LNAI, pp. 504–511). https://doi.org/10.1007/978-3-642-03348-3_50
Mendeley helps you to discover research relevant for your work.