Computing the split points for learning decision tree in MapReduce

6Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The explosive growth of Data is bringing more and more challenges and opportunities to data mining. In data mining, learning decision tree is a common method, in which determining split points is the key problem. Existing methods of calculating split points in the distributed setting on large data either (1) cause high communication overhead or (2) are not universal for different levels of skewness of data distribution. In this paper, we study the properties of Gini impurity, which is a measure for determining split points, and design new algorithms for calculating split points in MapReduce. Empirical evaluation demonstrates that our method outperforms existing state-of-the-art techniques on communication cost and universality. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, M., Shen, D., Yu, G., Kou, Y., & Nie, T. (2013). Computing the split points for learning decision tree in MapReduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 339–353). https://doi.org/10.1007/978-3-642-37450-0_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free