Handling Vanishing Gradient Problem Using Artificial Derivative

61Citations
Citations of this article
100Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Sigmoid function and ReLU are commonly used activation functions in neural networks (NN). However, sigmoid function is vulnerable to the vanishing gradient problem, while ReLU has a special vanishing gradient problem that is called dying ReLU problem. Though many studies provided methods to alleviate this problem, there has not been an efficient feasible solution. Hence, we proposed a method replacing the original derivative function with an artificial derivative in a pertinent way. Our method optimized gradients of activation functions without varying activation functions nor introducing extra layers. Our investigations demonstrated that the method can effectively alleviate the vanishing gradient problem for both ReLU and sigmoid function with few computational cost.

Cite

CITATION STYLE

APA

Hu, Z., Zhang, J., & Ge, Y. (2021). Handling Vanishing Gradient Problem Using Artificial Derivative. IEEE Access, 9, 22371–22377. https://doi.org/10.1109/ACCESS.2021.3054915

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free