Parallelizing record linkage for disclosure risk assessment

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Handling very large volumes of confidential data is becoming a common practice in many organizations such as statistical agencies. This calls for the use of protection methods that have to be validated in terms of the quality they provide. With the use of Record Linkage (RL) it is possible to compute the disclosure risk, which gives a measure of the quality of a data protection method. However, the RL methods proposed in the literature are computationally costly, which poses difficulties when frequent RL processes have to be executed on large data. Here, we propose a distributed computing technique to improve the performance of a RL process. We show that our technique not only improves the computing time of a RL process significantly, but it is also scalable in a distributed environment. Also, we show that distributed computation can be complemented with SMP based parallelization in each node increasing the final speedup. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Guisado-Gámez, J., Prat-Pérez, A., Nin, J., Muntés-Mulero, V., & Larriba-Pey, J. L. (2008). Parallelizing record linkage for disclosure risk assessment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5262 LNCS, pp. 190–202). Springer Verlag. https://doi.org/10.1007/978-3-540-87471-3_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free