Exponentiated Gradient versus Gradient Descent for Linear Predictors

Jyrki Kivinen; Manfred K. Warmuth

Journal ArticleOPEN ACCESS

Exponentiated Gradient versus Gradient Descent for Linear Predictors

Information and Computation (1997) 132(1) 1-63

DOI: 10.1006/inco.1996.2612

627Citations

249Readers

Abstract

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG±. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG± algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG± and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG± has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments which show that our worst-case upper bounds are quite tight already on simple artificial data. © 1997 Academic Press.

Cite

CITATION STYLE

APA

Kivinen, J., & Warmuth, M. K. (1997). Exponentiated Gradient versus Gradient Descent for Linear Predictors. Information and Computation, 132(1), 1–63. https://doi.org/10.1006/inco.1996.2612

Exponentiated Gradient versus Gradient Descent for Linear Predictors

Abstract

Cite

Register to see more suggestions