Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.

Cite

CITATION STYLE

APA

Nanayakkara, C., & Christen, P. (2022). Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage. In International Conference on Information and Knowledge Management, Proceedings (pp. 4354–4358). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557631

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free