Non-convergence and Limit Cycles in the Adam Optimizer

Sebastian Bock; Martin Weiß

Conference Proceedings

Non-convergence and Limit Cycles in the Adam Optimizer

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11728 LNCS 232-243

DOI: 10.1007/978-3-030-30484-3_20

4Citations

7Readers

Get full text

Abstract

One of the most popular training algorithms for deep neural networks is the Adaptive Moment Estimation (Adam) introduced by Kingma and Ba. Despite its success in many applications there is no satisfactory convergence analysis: only local convergence can be shown for batch mode under some restrictions on the hyperparameters, counterexamples exist for incremental mode. Recent results show that for simple quadratic objective functions limit cycles of period 2 exist in batch mode, but only for atypical hyperparameters, and only for the algorithm without bias correction. We extend the convergence analysis to all choices of the hyperparameters for quadratic functions. This finally answers the question of convergence for Adam in batch mode to the negative. We analyze the stability of these limit cycles and relate our analysis to other results where approximate convergence was shown, but under the additional assumption of bounded gradients which does not apply to quadratic functions. The investigation heavily relies on the use of computer algebra due to the complexity of the equations.

Author supplied keywords

Cite

CITATION STYLE

APA

Bock, S., & Weiß, M. (2019). Non-convergence and Limit Cycles in the Adam Optimizer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11728 LNCS, pp. 232–243). Springer Verlag. https://doi.org/10.1007/978-3-030-30484-3_20

Non-convergence and Limit Cycles in the Adam Optimizer

Abstract

Author supplied keywords

Cite

Register to see more suggestions