Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization with Clustering Analysis

24Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Text similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralization (WVDD) which can deal with complex semantic relations, including sentence components, word order and weights for Chinese language. Then, the clustering analysis is performed for the obtained similarity results. A K-means algorithm based on Spark architecture for parallel computing is adopted to accelerate clustering speed here. In experimental verification, the test sets are significant number of customer comments posted on the Jingdong website, which is a comprehensive online shopping mall. F-measure is used to evaluate the accuracy of the results obtained by the proposed method. The superiority of the proposed method is verified and compared with the sentence vector model (Doc2vec) and bag-of-words model. The proposed method can be applied to analyze network language, such as customers' comments online and web chat data.

Cite

CITATION STYLE

APA

Zhou, S., Xu, X., Liu, Y., Chang, R., & Xiao, Y. (2019). Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization with Clustering Analysis. IEEE Access, 7, 107247–107258. https://doi.org/10.1109/ACCESS.2019.2932334

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free