Scaling large learning problems with hard parallel mixtures

Ronan Collobert; Yoshua Bengio; Samy Bengio

Conference Proceedings

Scaling large learning problems with hard parallel mixtures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2388 8-23

DOI: 10.1007/3-540-45665-1_2

1Citations

26Readers

Get full text

Abstract

Ac hallenge for statistical learning is to deal with large data sets, e. g. in data mining. Popular learning algorithms such as Support Vector Machines have training time at least quadratic in the number of examples: they are hopeless to solve problems with a million examples. We propose a “hard parallelizable mixture” methodology which yields significantly reduced training time through modularization and parallelization: the training data is iteratively partitioned by a “gater” model in such a way that it becomes easy to learn an “expert” model separately in each region of the partition. Ap robabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to locally grow linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm provably goes down in a cost function that is an upper bound on the negative log-likelihood.

Cite

CITATION STYLE

APA

Collobert, R., Bengio, Y., & Bengio, S. (2002). Scaling large learning problems with hard parallel mixtures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2388, pp. 8–23). Springer Verlag. https://doi.org/10.1007/3-540-45665-1_2

Scaling large learning problems with hard parallel mixtures

Abstract

Cite

Register to see more suggestions