The convergence of back-propagaation is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with the tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks and offers explanations of why they work. Many authors have suggested that second-order optimization methods are advantageous for neural net training. It is shown that "classical" second order methods are impractical for large neural networks. A few methods are proposed that do not have these limitations.
CITATION STYLE
LeCun, Y., Bottou, L., Orr, G. B., & Müller, K.-R. (1998). Efficient BackProp (pp. 9–50). https://doi.org/10.1007/3-540-49430-8_2
Mendeley helps you to discover research relevant for your work.