Ngram2vec: Learning improved word representations from ngram co-occurrence statistics

55Citations
Citations of this article
202Readers
Mendeley users who have this article in their library.

Abstract

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.

Cite

CITATION STYLE

APA

Zhao, Z., Liu, T., Li, S., Li, B., & Du, X. (2017). Ngram2vec: Learning improved word representations from ngram co-occurrence statistics. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 244–253). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free