Batch is not heavy: Learning word representations from all samples

18Citations
Citations of this article
117Readers
Mendeley users who have this article in their library.

Abstract

Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the one-sample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remains at the same level as SGD, being determined by the number of positive samples rather than all samples. We evaluate AllVec on several benchmark tasks. Experiments show that AllVec outperforms sampling-based SGD methods with comparable efficiency, especially for small training corpora.

Cite

CITATION STYLE

APA

Xin, X., Yuan, F., He, X., & Jose, J. M. (2018). Batch is not heavy: Learning word representations from all samples. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 1853–1862). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-1172

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free