Efficient multisplitting on numerical data

Tapio Elomaa; Juho Rousu

Conference ProceedingsOPEN ACCESS

Efficient multisplitting on numerical data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1997) 1263 178-188

DOI: 10.1007/3-540-63223-9_117

1Citations

4Readers

Abstract

Numerical data poses a problem to symbolic learning methods, since numerical value ranges inherently need to be partitioned into intervals for representation and handling. An evaluation function is used to approximate the goodness of different partition candidates. Most existingmethods for multisplitting on numerical attributes axe based on heuristics, because of the apparent efficiency advantages. We characterize a class of well-behaved cumulative evaluation functions for which efficient discovery of the optimal multisplit is possible by dynamic programming. A single pass through the data suffices to evaluate multisplits of all axities. This class contains many important attribute evaluation functions familiar from symbolic machine learning research. Our empirical experiments convey that there is no significant differences in efficiency between the method that produces optimM partitions and those that are based on heuristics. Moreover,we demonstrate that optimal multisplitting can be beneficial in decision tree learning in contrast to using the much applied binarization of numerical attributes or heuristical multisplitting.

Cite

CITATION STYLE

APA

Elomaa, T., & Rousu, J. (1997). Efficient multisplitting on numerical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1263, pp. 178–188). Springer Verlag. https://doi.org/10.1007/3-540-63223-9_117

Efficient multisplitting on numerical data

Abstract

Cite

Register to see more suggestions