Technical Note: Bias in Information-Based Measures in Decision Tree Induction

162Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

Abstract

A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take. © 1994, Kluwer Academic Publishers. All rights reserved.

Cite

CITATION STYLE

APA

White, A. P., & Liu, W. Z. (1994). Technical Note: Bias in Information-Based Measures in Decision Tree Induction. Machine Learning, 15(3), 321–329. https://doi.org/10.1023/A:1022694010754

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free