Stable adaptive momentum for rapid online learning in nonlinear systems

3Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum-an efficient yet extremely unstable stochastic gradient descent algorithm-we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor 0 ≤ λ ≤ 1 that moothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O (n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task. © Springer-Verlag Berlin Heidelberg 2002.

Cite

CITATION STYLE

APA

Graepel, T., & Schraudolph, N. N. (2002). Stable adaptive momentum for rapid online learning in nonlinear systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2415 LNCS, pp. 450–455). Springer Verlag. https://doi.org/10.1007/3-540-46084-5_73

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free