Recent developments in data de-identification technologies offer sophisticated solutions to protect medical data when, especially the data is to be provided for secondary purposes such as clinical or biomedical research. So as to determine to what degree an approach– along with its tool– is usable and effective, this paper takes into consideration a number of de-identification tools that aim at reducing the re-identification risk for the published medical data, yet preserving its statistical meanings. We therefore evaluate the residual risk of re-identification by conducting an experimental evaluation of the most stable research-based tools, as applied to our Electronic Health Records (EHRs) database, to assess which tool exhibits better performance with different quasiidentifiers. Our evaluation criteria are quantitative as opposed to other descriptive and qualitative assessments. We notice that on comparing individual disclosure risk and information loss of each published data, the μ-Argus tool performs better. Also, the generalization method is considerably better than the suppression method in terms of reducing risk and avoiding information loss. We also find that sdcMicro has the best scalability among its counterparts, as has been observed experimentally on a virtual data consisted of 33 variables and 10,000 records.
CITATION STYLE
Liu, Z., Qamar, N., & Qian, J. (2014). A quantitative analysis of the performance and scalability of de-identification tools for medical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8315, pp. 274–289). Springer Verlag. https://doi.org/10.1007/978-3-642-53956-5_18
Mendeley helps you to discover research relevant for your work.