Abstract
Gradient-based optimization algorithms are the standard methods foradapt-ing the weights of neural networks. The natural gradient gives thesteepestdescent direction based on a non-Euclidean, from a theoretical pointof viewmore appropriate metric in the weight space. While the natural gradienthasalready proven to be advantageous for online learning, we exploreits bene-ts for batch learning: We empirically compare Rprop (resilient backprop-agation), one of the best performing rst-order learning algorithms,usingthe Euclidean and the non-Euclidean metric, respectively. As batchsteepestdescent on the natural gradient is closely related to Levenberg-Marquardtoptimization, we add this method to our comparison.It turns out that the Rprop algorithm can indeed prot from the nat-ural gradient: the optimization speed measured in terms of weightupdatescan increase signicantly compared to the original version. Rpropbased onthe non-Euclidean metric shows at least similar performance as Levenberg-Marquardt optimization on the two benchmark problems considered andappears to be a slightly more robust. However, in Levenberg-Marquardtop-timization and Rprop using the natural gradient computing a weightupdaterequires cubic time and quadratic space. Further, both methods haveaddi-tional hyperparameters that are dicult to adjust. In contrast, conventionalRprop has linear space and time complexity, and its hyperparametersneedno dicult tuning.
Cite
CITATION STYLE
Igel, C., Toussaint, M., & Weishui, W. (2005). Rprop Using the Natural Gradient. In Trends and Applications in Constructive Approximation (pp. 259–272). Birkhäuser Basel. https://doi.org/10.1007/3-7643-7356-3_19
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.