Abstract
This work develops a new statistical understanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations permit efficient and effective recovery of model parameters with lexical semantics. We further show in experiments that these techniques empirically outperform existing spectral methods on word similarity and analogy tasks, and are also competitive with other popular methods such as WORD2VEC and GLOVE.
Cite
CITATION STYLE
Stratos, K., Collins, M., & Hsu, D. (2015). Model-basedword embeddings from decompositions of count matrices. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 1282–1291). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-1124
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.