The explosive growth of Data is bringing more and more challenges and opportunities to data mining. In data mining, learning decision tree is a common method, in which determining split points is the key problem. Existing methods of calculating split points in the distributed setting on large data either (1) cause high communication overhead or (2) are not universal for different levels of skewness of data distribution. In this paper, we study the properties of Gini impurity, which is a measure for determining split points, and design new algorithms for calculating split points in MapReduce. Empirical evaluation demonstrates that our method outperforms existing state-of-the-art techniques on communication cost and universality. © Springer-Verlag 2013.
CITATION STYLE
Zhu, M., Shen, D., Yu, G., Kou, Y., & Nie, T. (2013). Computing the split points for learning decision tree in MapReduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 339–353). https://doi.org/10.1007/978-3-642-37450-0_26
Mendeley helps you to discover research relevant for your work.