Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. In this work, inspired by diagonal approximations and factored approximations such as Kronecker-factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC), which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.
CITATION STYLE
Gao, K., Liu, X., Huang, Z., Wang, M., Wang, Z., Xu, D., & Yu, F. (2021). A Trace-restricted Kronecker-factored Approximation to Natural Gradient. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 9A, pp. 7519–7527). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i9.16921
Mendeley helps you to discover research relevant for your work.