In this work, we generalized and unified two recent completely different works of Jascha [10] and Lee [2] respectively into one by proposing the proximal stochastic Newton-type gradient (PROXTONE) method for optimizing the sums of two convex functions: one is the average of a huge number of smooth convex functions, and the other is a nonsmooth convex function. Our PROXTONE incorporates second order information to obtain stronger convergence results, that it achieves a linear convergence rate not only in the value of the objective function, but also for the solution. The proofs are simple and intuitive, and the results and technique can be served as a initiate for the research on the proximal stochastic methods that employ second order information. The methods and principles proposed in this paper can be used to do logistic regression, training of deep neural network and so on. Our numerical experiments shows that the PROXTONE achieves better computation performance than existing methods.
CITATION STYLE
Shi, Z., & Liu, R. (2015). Large scale optimization with proximal stochastic newton-type gradient descent. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9284, pp. 691–704). Springer Verlag. https://doi.org/10.1007/978-3-319-23528-8_43
Mendeley helps you to discover research relevant for your work.