Scalable random forests for massive data

Bingguo Li; Xiaojun Chen; Mark Junjie Li; Joshua Zhexue Huang; Shengzhong Feng

Conference Proceedings

Scalable random forests for massive data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7301 LNAI(PART 1) 135-146

DOI: 10.1007/978-3-642-30217-6_12

10Citations

23Readers

Get full text

Abstract

This paper proposes a scalable random forest algorithm SRF with MapReduce implementation. A breadth-first approach is used to grow decision trees for a random forest model. At each level of the trees, a pair of map and reduce functions split the nodes. A mapper is dispatched to a local machine to compute the local histograms of subspace features of the nodes from a data block. The local histograms are submitted to reducers to compute the global histograms from which the best split conditions of the nodes are calculated and sent to the controller on the master machine to update the random forest model. A random forest model is built with a sequence of map and reduce functions. Experiments on large synthetic data have shown that SRF is scalable to the number of trees and the number of examples. The SRF algorithm is able to build a random forest of 100 trees in a little more than 1 hour from 110 Gigabyte data with 1000 features and 10 million records. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, B., Chen, X., Li, M. J., Huang, J. Z., & Feng, S. (2012). Scalable random forests for massive data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7301 LNAI, pp. 135–146). https://doi.org/10.1007/978-3-642-30217-6_12

Scalable random forests for massive data

Abstract

Author supplied keywords

Cite

Register to see more suggestions