Combining word embedding and semantic lexicon for Chinese word similarity computation

Jiahuan Pei; Cong Zhang; Degen Huang; Jianjun Ma

Book Chapter

Combining word embedding and semantic lexicon for Chinese word similarity computation

Springer Verlag, (2016), 766-777

DOI: 10.1007/978-3-319-50496-4_69

9Citations

7Readers

Get full text

Abstract

Large corpus-based embedding methods have received increasing attention for their flexibility and effectiveness in many NLP tasks including Word Similarity (WS). However, these approaches rely on high-quality corpora and neglect the human’s intelligence contained in semantic resources such as Tongyici Cilin and Hownet. This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin. We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result. In the Chinese Lexical Similarity Computation (CLSC) shared task, we rank No. 2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coefficient. After the submission, we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks. Our final results are 0.541/0.514, which outperform the state-of-the-art performance to the best of our knowledge.

Author supplied keywords

Cite

CITATION STYLE

APA

Pei, J., Zhang, C., Huang, D., & Ma, J. (2016). Combining word embedding and semantic lexicon for Chinese word similarity computation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10102, pp. 766–777). Springer Verlag. https://doi.org/10.1007/978-3-319-50496-4_69

Combining word embedding and semantic lexicon for Chinese word similarity computation

Abstract

Author supplied keywords

Cite

Register to see more suggestions