A Trace-restricted Kronecker-factored Approximation to Natural Gradient

Kaixin Gao; Xiaolei Liu; Zhenghai Huang; Min Wang; Zidong Wang; Dachuan Xu; Fan Yu

Conference ProceedingsOPEN ACCESS

A Trace-restricted Kronecker-factored Approximation to Natural Gradient

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 9A 7519-7527

DOI: 10.1609/aaai.v35i9.16921

4Citations

11Readers

Abstract

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. In this work, inspired by diagonal approximations and factored approximations such as Kronecker-factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC), which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.

Cite

CITATION STYLE

APA

Gao, K., Liu, X., Huang, Z., Wang, M., Wang, Z., Xu, D., & Yu, F. (2021). A Trace-restricted Kronecker-factored Approximation to Natural Gradient. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 9A, pp. 7519–7527). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i9.16921

A Trace-restricted Kronecker-factored Approximation to Natural Gradient

Abstract

Cite

Register to see more suggestions