Scaling up inductive learning with massive parallelism

0Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

Abstract

Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research. © 1996 Kluwer Academic Publishers,.

Cite

CITATION STYLE

APA

Provost, F. J. (1996). Scaling up inductive learning with massive parallelism. Machine Learning, 23(1), 33–46. https://doi.org/10.1007/bf00116898

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free