Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate

34Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.
Get full text

Abstract

First-order optimization algorithms have been proven prominent in deep learning. In particular, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative algorithm to Adam. The proofs, code, and other supplementary materials are already released.

Cite

CITATION STYLE

APA

Huang, H., Wang, C., & Dong, B. (2019). Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 2556–2562). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/355

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free