Exponentiated Gradient versus Gradient Descent for Linear Predictors

627Citations
Citations of this article
249Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG±. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG± algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG± and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG± has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments which show that our worst-case upper bounds are quite tight already on simple artificial data. © 1997 Academic Press.

Cite

CITATION STYLE

APA

Kivinen, J., & Warmuth, M. K. (1997). Exponentiated Gradient versus Gradient Descent for Linear Predictors. Information and Computation, 132(1), 1–63. https://doi.org/10.1006/inco.1996.2612

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free