A text similarity measurement based on semantic fingerprint of characteristic phrases

10Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method. 2020 Chinese Institute of Electronics.

Cite

CITATION STYLE

APA

Shanchen, P., Jiamin, Y., Ting, L., Hua, Z., & Hongqi, C. (2020). A text similarity measurement based on semantic fingerprint of characteristic phrases. Chinese Journal of Electronics, 29(2), 233–241. https://doi.org/10.1049/cje.2019.12.011

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free