Generic entity resolution with negative rules

40Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Entity resolution (ER) (also known as deduplication or merge-purge) is a process of identifying records that refer to the same real-world entity and merging them together. In practice, ER results may contain " inconsistencies," either due to mistakes by the match and merge function writers or changes in the application semantics. To remove the inconsistencies, we introduce "negative rules" that disallow inconsistencies in the ER solution (ER-N). A consistent solution is then derived based on the guidance from a domain expert. The inconsistencies can be resolved in several ways, leading to accurate solutions. We formalize ER-N, treating the match, merge, and negative rules as black boxes, which permits expressive and extensible ER-N solutions. We identify important properties for the rules that, if satisfied, enable less costly ER-N. We develop and evaluate two algorithms that find an ER-N solution based on guidance from the domain expert: the GNR algorithm that does not assume the properties and the ENR algorithm that exploits the properties. © 2009 Springer-Verlag.

Cite

CITATION STYLE

APA

Whang, S. E., Benjelloun, O., & Garcia-Molina, H. (2009). Generic entity resolution with negative rules. VLDB Journal, 18(6), 1261–1277. https://doi.org/10.1007/s00778-009-0136-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free