Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Yuqing Li; Tao Luo; Nung Kwan Yip

Journal ArticleOPEN ACCESS

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

CSIAM Transactions on Applied Mathematics (2022) 3(4) 692-760

DOI: 10.4208/csiam-am.SO-2021-0053

2Citations

15Readers

Get full text

Abstract

Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in [25]. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in [24]. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Y., Luo, T., & Yip, N. K. (2022). Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH). CSIAM Transactions on Applied Mathematics, 3(4), 692–760. https://doi.org/10.4208/csiam-am.SO-2021-0053

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Abstract

Author supplied keywords

Cite

Register to see more suggestions