Sentence-level Privacy for Document Embeddings

Casey Meehan; Khalil Mrini; Kamalika Chaudhuri

Conference ProceedingsOPEN ACCESS

Sentence-level Privacy for Document Embeddings

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 3367-3380

DOI: 10.18653/v1/2022.acl-long.238

14Citations

39Readers

Abstract

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose ?-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding ?-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

Cite

CITATION STYLE

APA

Meehan, C., Mrini, K., & Chaudhuri, K. (2022). Sentence-level Privacy for Document Embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3367–3380). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.238

Sentence-level Privacy for Document Embeddings

Abstract

Cite

Register to see more suggestions