Private record linkage: Comparison of selected techniques for name matching

Pawel Grzebala; Michelle Cheatham

Conference ProceedingsOPEN ACCESS

Private record linkage: Comparison of selected techniques for name matching

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9678 593-606

DOI: 10.1007/978-3-319-34129-3_36

4Citations

13Readers

Abstract

The rise of Big Data Analytics has shown the utility of analyzing all aspects of a problem by bringing together disparate data sets. Efficient and accurate private record linkage algorithms are necessary to achieve this. However, records are often linked based on personally identifiable information, and protecting the privacy of individuals is critical. This paper contributes to this field by studying an important component of the private record linkage problem: linking based on names while keeping those names encrypted, both on disk and in memory. We explore the applicability, accuracy and speed of three different primary approaches to this problem (along with several variations) and compare the results to common name-matching metrics on unprotected data. While these approaches are not new, this paper provides a thorough analysis on a range of datasets containing systematically introduced flaws common to name-based data entry, such as typographical errors, optical character recognition errors, and phonetic errors.

Cite

CITATION STYLE

APA

Grzebala, P., & Cheatham, M. (2016). Private record linkage: Comparison of selected techniques for name matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9678, pp. 593–606). Springer Verlag. https://doi.org/10.1007/978-3-319-34129-3_36

Private record linkage: Comparison of selected techniques for name matching

Abstract

Cite

Register to see more suggestions