Variance-Reduced Methods for Machine Learning

108Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight years have seen an exciting new development: Variance reduction for stochastic optimization methods. These variance-reduced (VR) methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory and practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at nonexpert readers. We focus mainly on the convex setting and leave pointers to readers interested in extensions for minimizing nonconvex functions.

Cite

CITATION STYLE

APA

Gower, R. M., Schmidt, M., Bach, F., & Richtarik, P. (2020). Variance-Reduced Methods for Machine Learning. Proceedings of the IEEE, 108(11), 1968–1983. https://doi.org/10.1109/JPROC.2020.3028013

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free