Obfuscating document stylometry to preserve author anonymity

Gary Kacmarcik; Michael Gamon

Conference Proceedings

Obfuscating document stylometry to preserve author anonymity

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (2006) 444-451

DOI: 10.3115/1273073.1273131

46Citations

143Readers

Get full text

Abstract

This paper explores techniques for reducing the effectiveness of standard authorship attribution techniques so that an author A can preserve anonymity for a particular document D. We discuss feature selection and adjustment and show how this information can be fed back to the author to create a new document D' for which the calculated attribution moves away from A. Since it can be labor intensive to adjust the document in this fashion, we attempt to quantify the amount of effort required to produce the anonymized document and introduce two levels of anonymization: shallow and deep. In our test set, we show that shallow anonymization can be achieved by making 14 changes per 1000 words to reduce the likelihood of identifying A as the author by an average of more than 83%. For deep anonymization, we adapt the unmasking work of Koppel and Schler to provide feedback that allows the author to choose the level of anonymization.

Cite

CITATION STYLE

APA

Kacmarcik, G., & Gamon, M. (2006). Obfuscating document stylometry to preserve author anonymity. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (pp. 444–451). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1273073.1273131

Obfuscating document stylometry to preserve author anonymity

Abstract

Cite

Register to see more suggestions