Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo; Bin Gu; Heng Huang

Conference ProceedingsOPEN ACCESS

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 9A 7883-7890

DOI: 10.1609/aaai.v35i9.16962

13Citations

9Readers

Abstract

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. Warmup is one of nontrivial techniques to stabilize the convergence of large batch training. However, warmup is an empirical method and it is still unknown whether there is a better algorithm with theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We prove the convergence of our algorithm by introducing a new fine-grained analysis of gradient-based methods. Furthermore, the new analysis also helps to understand two other empirical tricks, layer-wise adaptive rate scaling and linear learning rate scaling. We conduct extensive experiments and demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.

Cite

CITATION STYLE

APA

Huo, Z., Gu, B., & Huang, H. (2021). Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 9A, pp. 7883–7890). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i9.16962

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Abstract

Cite

Register to see more suggestions