BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs

Saurabh Gupta; Vineet Khare

Conference ProceedingsOPEN ACCESS

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs

Proceedings of MLHPC 2017: Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis (2017)

DOI: 10.1145/3146347.3146354

16Citations

38Readers

Get full text

Abstract

Word2Vec is a popular algorithm used for generating dense vector representations of words in large corpora using unsupervised learning. The resulting vectors have been shown to capture semantic relationships between the corresponding words and are used extensively for many downstream natural language processing (NLP) tasks like sentiment analysis, named entity recognition and machine translation. Most open-source implementations of the algorithm have been parallelized for multi-core CPU architectures including the original C implementation by Mikolov et al. [1] and FastText [2] by Facebook. A few other implementations have attempted to leverage GPU parallelization but at the cost of accuracy and scalability. In this work, we present BlazingText, a highly optimized implementation of word2vec in CUDA, that can leverage multiple GPUs for training. BlazingText can achieve a training speed of up to 43M words/sec on 8 GPUs, which is a 9x speedup over 8-threaded CPU implementations, with minimal effect on the quality of the embeddings.

Author supplied keywords

Cite

CITATION STYLE

APA

Gupta, S., & Khare, V. (2017). BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs. In Proceedings of MLHPC 2017: Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc. https://doi.org/10.1145/3146347.3146354

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions