Scalable random forests for massive data

10Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes a scalable random forest algorithm SRF with MapReduce implementation. A breadth-first approach is used to grow decision trees for a random forest model. At each level of the trees, a pair of map and reduce functions split the nodes. A mapper is dispatched to a local machine to compute the local histograms of subspace features of the nodes from a data block. The local histograms are submitted to reducers to compute the global histograms from which the best split conditions of the nodes are calculated and sent to the controller on the master machine to update the random forest model. A random forest model is built with a sequence of map and reduce functions. Experiments on large synthetic data have shown that SRF is scalable to the number of trees and the number of examples. The SRF algorithm is able to build a random forest of 100 trees in a little more than 1 hour from 110 Gigabyte data with 1000 features and 10 million records. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, B., Chen, X., Li, M. J., Huang, J. Z., & Feng, S. (2012). Scalable random forests for massive data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7301 LNAI, pp. 135–146). https://doi.org/10.1007/978-3-642-30217-6_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free