Building decision trees for the multi-class imbalance problem

T. Ryan Hoens; Qi Qian; Nitesh V. Chawla; Zhi Hua Zhou

Conference Proceedings

Building decision trees for the multi-class imbalance problem

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7301 LNAI(PART 1) 122-134

DOI: 10.1007/978-3-642-30217-6_11

56Citations

72Readers

Get full text

Abstract

Learning in imbalanced datasets is a pervasive problem prevalent in a wide variety of real-world applications. In imbalanced datasets, the class of interest is generally a small fraction of the total instances, but misclassification of such instances is often expensive. While there is a significant body of research on the class imbalance problem for binary class datasets, multi-class datasets have received considerably less attention. This is partially due to the fact that the multi-class imbalance problem is often much harder than its related binary class problem, as the relative frequency and cost of each of the classes can vary widely from dataset to dataset. In this paper we study the multi-class imbalance problem as it relates to decision trees (specifically C4.4 and HDDT), and develop a new multi-class splitting criterion. From our experiments we show that multi-class Hellinger distance decision trees, when combined with decomposition techniques, outperform C4.4. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Hoens, T. R., Qian, Q., Chawla, N. V., & Zhou, Z. H. (2012). Building decision trees for the multi-class imbalance problem. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7301 LNAI, pp. 122–134). https://doi.org/10.1007/978-3-642-30217-6_11

Building decision trees for the multi-class imbalance problem

Abstract

Cite

Register to see more suggestions