Near duplicate text detection using frequency-biased signatures

Yifang Sun; Jianbin Qin; Wei Wang

Conference Proceedings

Near duplicate text detection using frequency-biased signatures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8180 LNCS(PART 1) 277-291

DOI: 10.1007/978-3-642-41230-1_24

9Citations

12Readers

Get full text

Abstract

As the use of electronic documents are becoming more popular, people want to find documents completely or partially duplicate. In this paper, we propose a near duplicate text detection framework using signatures to save space and query time. We also propose a novel signature selection algorithm which uses collection frequency of q-grams. We compare our algorithm with Winnowing, which is one of the state-of-the-art signature selection algorithms. We show that our algorithm acquires much better accuracy with less time and space cost. We perform extensive experiments to verify our conclusion. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Sun, Y., Qin, J., & Wang, W. (2013). Near duplicate text detection using frequency-biased signatures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8180 LNCS, pp. 277–291). https://doi.org/10.1007/978-3-642-41230-1_24

Near duplicate text detection using frequency-biased signatures

Abstract

Author supplied keywords

Cite

Register to see more suggestions