Scaling up inductive learning with massive parallelism

Foster John Provost

Journal ArticleOPEN ACCESS

Scaling up inductive learning with massive parallelism

Provost F

Machine Learning (1996) 23(1) 33-46

DOI: 10.1007/bf00116898

0Citations

48Readers

Abstract

Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research. © 1996 Kluwer Academic Publishers,.

Author supplied keywords

Cite

CITATION STYLE

APA

Provost, F. J. (1996). Scaling up inductive learning with massive parallelism. Machine Learning, 23(1), 33–46. https://doi.org/10.1007/bf00116898

Scaling up inductive learning with massive parallelism

Abstract

Author supplied keywords

Cite

Register to see more suggestions