An information retrieval approach to document sanitization

David F. Nettleton; Daniel Abril

Journal Article

An information retrieval approach to document sanitization

Studies in Computational Intelligence (2015) 567 151-166

DOI: 10.1007/978-3-319-09885-2_9

0Citations

7Readers

Get full text

Abstract

In this paper we use information retrieval metrics to evaluate the effect of a document sanitization process, measuring information loss and risk of disclosure. In order to sanitize the documents we have developed a semi-automatic anonymiza-tion process following the guidelines of Executive Order 13526 (2009) of the US Administration. It embodies two main and independent steps: (i) identifying and anonymizing specific person names and data, and (ii) concept generalization based on WordNet categories, in order to identify words categorized as classified. Finally, we manually revise the text from a contextual point of view to eliminate complete sentences, paragraphs and sections, where necessary. For empirical tests, we use a subset of the Wikileaks Cables, made up of documents relating to five key news items which were revealed by the cables.

Cite

CITATION STYLE

APA

Nettleton, D. F., & Abril, D. (2015). An information retrieval approach to document sanitization. Studies in Computational Intelligence, 567, 151–166. https://doi.org/10.1007/978-3-319-09885-2_9

An information retrieval approach to document sanitization

Abstract

Cite

Register to see more suggestions