Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK)[9]. In the majority of works this kernel is considered to be time-invariant[9, 13]. In contrast, we empirically explore these properties along the optimization and show that in practice top eigenfunctions of NTK align toward the target function learned by NN which improves the overall optimization performance. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, we study how learning rate decay affects the neural spectrum. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.
CITATION STYLE
Kopitkov, D., & Indelman, V. (2020). Neural Spectrum Alignment: Empirical Study. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12397 LNCS, pp. 168–179). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61616-8_14
Mendeley helps you to discover research relevant for your work.