Training deep neural networks using conjugate gradient-like methods

9Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

The goal of this article is to train deep neural networks that accelerate useful adaptive learning rate optimization algorithms such as AdaGrad, RMSProp, Adam, and AMSGrad. To reach this goal, we devise an iterative algorithm combining the existing adaptive learning rate optimization algorithms with conjugate gradient-like methods, which are useful for constrained optimization. Convergence analyses show that the proposed algorithm with a small constant learning rate approximates a stationary point of a nonconvex optimization problem in deep learning. Furthermore, it is shown that the proposed algorithm with diminishing learning rates converges to a stationary point of the nonconvex optimization problem. The convergence and performance of the algorithm are demonstrated through numerical comparisons with the existing adaptive learning rate optimization algorithms for image and text classification. The numerical results show that the proposed algorithm with a constant learning rate is superior for training neural networks.

Cite

CITATION STYLE

APA

Iiduka, H., & Kobayashi, Y. (2020). Training deep neural networks using conjugate gradient-like methods. Electronics (Switzerland), 9(11), 1–25. https://doi.org/10.3390/electronics9111809

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free