Improving word representation quality trained by word2vec via a more efficient hierarchical clustering method

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In traditional word2vec methods, hierarchical softmax algorithm uses the whole vocabulary to construct a Huffman tree and it trains each pair of words just in logarithmic time consumption. But due to the lack of consideration about cooperation of each word in the corpus, it will reduce the performance of language model and the trained word vectors. In this paper, we substitute a purely data-driven method for the original Huffman-tree method to rebuild the binary tree. The new construction method utilizes the semantical and syntactical cooperation of words to cluster the words hierarchically. The cooperation of words is reflected in the word vectors which collected from the initial Huffman-tree training procedure. Our methods substantially improve the performances of word vectors in semantical and syntactical tasks.

Cite

CITATION STYLE

APA

Yuan, Z., Ban, X., & Hu, J. (2018). Improving word representation quality trained by word2vec via a more efficient hierarchical clustering method. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11151 LNCS, pp. 299–303). Springer Verlag. https://doi.org/10.1007/978-3-030-00560-3_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free