Abstract
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take. © 1994, Kluwer Academic Publishers. All rights reserved.
Author supplied keywords
Cite
CITATION STYLE
White, A. P., & Liu, W. Z. (1994). Technical Note: Bias in Information-Based Measures in Decision Tree Induction. Machine Learning, 15(3), 321–329. https://doi.org/10.1023/A:1022694010754
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.