Sentence-level Privacy for Document Embeddings

14Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose ?-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding ?-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

Cite

CITATION STYLE

APA

Meehan, C., Mrini, K., & Chaudhuri, K. (2022). Sentence-level Privacy for Document Embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3367–3380). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.238

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free