Why ADAGRAD fails for online topic modeling

5Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.

Abstract

Online topic modeling, i.e., topic modeling with stochastic variational inference, is a powerful and efficient technique for analyzing large datasets, and ADAGRAD is a widely-used technique for tuning learning rates during online gradient optimization. However, these two techniques do not work well together. We show that this is because ADAGRAD uses accumulation of previous gradients as the learning rates’ denominators. For online topic modeling, the magnitude of gradients is very large. It causes learning rates to shrink very quickly, so the parameters cannot fully converge until the training ends.

Cite

CITATION STYLE

APA

Lu, Y., Lund, J., & Boyd-Graber, J. (2017). Why ADAGRAD fails for online topic modeling. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 446–451). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1046

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free