A survey on removal of duplicate records in database

6Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Deduplication is a task of identifying one or more records in repository that represents same object or entity. The problem is that the same data may be represented in different way in every database. While merging the databases, duplicates occur despite different schemas, writing styles or misspellings. They are called as replicas. Removing replicas from the repositories provides high quality information and saves processing time. This paper presents a thorough analysis of similarity metrics to identify similar fields in records and a set of algorithms and duplicate detection tools to detect and remove the replicas from the database.

Cite

CITATION STYLE

APA

Karthigha, M., & Krishna Anand, S. (2013). A survey on removal of duplicate records in database. Indian Journal of Science and Technology, 6(4), 4306–4311. https://doi.org/10.17485/ijst/2013/v6i4.11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free