Towards parameter-free blocking for scalable record linkage

  • Christen P
N/ACitations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect. a main challenge when linking large databases is the complexity of the linkage pro- cess: potentially each record in one database has to be compared with all records in the other database. various techniques, collectively know as ‘blocking’, have been de- veloped to dealwith this quadratic complexity. most of these techniques require several parameters to be set by the user in order to achieve good results. in this paper we evaluate six blocking techniques within a common framework with regard to the number and quality of the candidate record pairs generated. we propose a modification to two existing techniques that reduces the variance in the quality of the blocking results over a range of parameter values, enabling more robust, practical record linkage without the need of time consuming manual parameter tuning.

Author supplied keywords

Cite

CITATION STYLE

APA

Christen, P. (2007). Towards parameter-free blocking for scalable record linkage. Technical Report TRCS-07-03, ANU Joint Computer Science Technical Report Series, The Australian National University, Canberra., (TR-CS-07-03). Retrieved from http://cs.anu.edu.au/techreports/2007/TR-CS-07-03.html

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free