A text similarity measurement based on semantic fingerprint of characteristic phrases

10Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method. 2020 Chinese Institute of Electronics.

References Powered by Scopus

Similarity estimation techniques from rounding algorithms

1882Citations
N/AReaders
Get full text

Detecting near-duplicates for web crawling

525Citations
N/AReaders
Get full text

Copy Detection Mechanisms for Digital Documents

370Citations
N/AReaders
Get full text

Cited by Powered by Scopus

On-site text classification and knowledge mining for large-scale projects construction by integrated intelligent approach

62Citations
N/AReaders
Get full text

Keyword Extraction from Scientific Research Projects Based on SRP-TF-IDF

39Citations
N/AReaders
Get full text

Multilayered-quality education ecosystem (MQEE): an intelligent education modal for sustainable quality education

15Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Shanchen, P., Jiamin, Y., Ting, L., Hua, Z., & Hongqi, C. (2020). A text similarity measurement based on semantic fingerprint of characteristic phrases. Chinese Journal of Electronics, 29(2), 233–241. https://doi.org/10.1049/cje.2019.12.011

Readers over time

‘20‘21‘22‘23‘2401234

Readers' Seniority

Tooltip

Lecturer / Post doc 2

40%

Professor / Associate Prof. 1

20%

PhD / Post grad / Masters / Doc 1

20%

Researcher 1

20%

Readers' Discipline

Tooltip

Computer Science 5

83%

Social Sciences 1

17%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 7

Save time finding and organizing research with Mendeley

Sign up for free
0