Computing the split points for learning decision tree in MapReduce

Mingdong Zhu; Derong Shen; Ge Yu; Yue Kou; Tiezheng Nie

Conference Proceedings

Computing the split points for learning decision tree in MapReduce

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7826 LNCS(PART 2) 339-353

DOI: 10.1007/978-3-642-37450-0_26

6Citations

2Readers

Get full text

Abstract

The explosive growth of Data is bringing more and more challenges and opportunities to data mining. In data mining, learning decision tree is a common method, in which determining split points is the key problem. Existing methods of calculating split points in the distributed setting on large data either (1) cause high communication overhead or (2) are not universal for different levels of skewness of data distribution. In this paper, we study the properties of Gini impurity, which is a measure for determining split points, and design new algorithms for calculating split points in MapReduce. Empirical evaluation demonstrates that our method outperforms existing state-of-the-art techniques on communication cost and universality. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, M., Shen, D., Yu, G., Kou, Y., & Nie, T. (2013). Computing the split points for learning decision tree in MapReduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 339–353). https://doi.org/10.1007/978-3-642-37450-0_26

Computing the split points for learning decision tree in MapReduce

Abstract

Author supplied keywords

Cite

Register to see more suggestions