Theoretical analysis of function of derivative term in on-line gradient descent learning

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In on-line gradient descent learning, the local property of the derivative term of the output can slow convergence. Improving the derivative term, such as by using the natural gradient, has been proposed for speeding up the convergence. Beside this sophisticated method, "simple method" that replace the derivative term with a constant has proposed and showed that this greatly increases convergence speed. Although this phenomenon has been analyzed empirically, however, theoretical analysis is required to show its generality. In this paper, we theoretically analyze the effect of using the simple method. Our results show that, with the simple method, the generalization error decreases faster than with the true gradient descent method when the learning step is smaller than optimum value η opt . When it is larger than η opt , it decreases slower with the simple method, and the residual error is larger than with the true gradient descent method. Moreover, when there is output noise, η opt is no longer optimum; thus, the simple method is not robust in noisy circumstances. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Hara, K., Katahira, K., Okanoya, K., & Okada, M. (2012). Theoretical analysis of function of derivative term in on-line gradient descent learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7553 LNCS, pp. 9–16). https://doi.org/10.1007/978-3-642-33266-1_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free