Suffix array blocking for efficient record linkage and de-duplication in sliding window fashion

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Record linkage is an essential process in information mix, which is utilized as a part of combining, coordinating and copy expulsion from a few databases that allude to the same substances. De-duplication is the procedure of uprooting copy records in a solitary database. Because of multifaceted nature of today’s database, coordinating records in single database is an essential one. Indexing strategies are utilized to productively actualize record linkage and De-duplication. Our additional gathering strategy with jaro-winkler similarity measure exploits the ordering used by the list to combine comparative pieces at negligible additional cost, bringing about a much higher exactness while holding the high adaptability of the base suffix array method. We complete an inside and out examination of our system what’s more, show results from examinations using Cora, restaurant and real identity data which highlights the significance of utilizing proficient as a part of indexing and hindering in true applications where information sets contain a large number of records. This paper presents suffix array blocking for efficacious record linkage and de-duplication in sliding window fashion.

Author supplied keywords

Cite

CITATION STYLE

APA

Warke, Y. (2017). Suffix array blocking for efficient record linkage and de-duplication in sliding window fashion. In Advances in Intelligent Systems and Computing (Vol. 468, pp. 57–65). Springer Verlag. https://doi.org/10.1007/978-981-10-1675-2_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free