In neural network optimization, the learning rate of the gradient descent strongly affects performance. This prevents reliable out-of-the-box training of a model on a new problem. We propose the All Learning Rates At Once (Alrao) algorithm for deep learning architectures: each neuron or unit in the network gets its own learning rate, randomly sampled at startup from a distribution spanning several orders of magnitude. The network becomes a mixture of slow and fast learning units. Surprisingly, Alrao performs close to SGD with an optimally tuned learning rate, for various tasks and network architectures. In our experiments, all Alrao runs were able to learn well without any tuning.
CITATION STYLE
Blier, L., Wolinski, P., & Ollivier, Y. (2020). Learning with Random Learning Rates. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11907 LNAI, pp. 449–464). Springer. https://doi.org/10.1007/978-3-030-46147-8_27
Mendeley helps you to discover research relevant for your work.