A scalable and efficient subgroup blocking scheme for multidatabase record linkage

5Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Record linkage is a commonly used task in data integration to facilitate the identification of matching records that refer to the same entity from different databases. The scalability of multidatabase record linkage (MDRL) is significantly challenged with the increase of both the sizes and the number of databases that are to be linked. Identifying matching records across subgroups of databases is an important aspect in MDRL that has not been addressed so far. We propose a scalable subgroup blocking approach for MDRL that uses an efficient search over a graph structure to identify similar blocks of records that need to be compared across subgroups of multiple databases. We provide an analysis of our technique in terms of complexity and blocking quality. We conduct an empirical study on large real-world datasets that shows our approach is scalable with the size of subgroups and the number of databases, and outperforms an existing state-of-the-art blocking technique for MDRL.

Cite

CITATION STYLE

APA

Ranbaduge, T., Vatsalan, D., & Christen, P. (2018). A scalable and efficient subgroup blocking scheme for multidatabase record linkage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10939 LNAI, pp. 15–27). Springer Verlag. https://doi.org/10.1007/978-3-319-93040-4_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free