TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

Xin Zhou; Yi Lu; Ruotian Ma; Tao Gui; Yuran Wang; Yong Ding; Yibo Zhang; Qi Zhang; Xuanjing Huang

Conference ProceedingsOPEN ACCESS

TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 5459-5473

DOI: 10.18653/v1/2023.findings-acl.337

6Citations

13Readers

Abstract

In real-world applications, pre-trained language models are typically deployed on the cloud, allowing clients to upload data and perform compute-intensive inference remotely. To avoid sharing sensitive data directly with service providers, clients can upload numerical representations rather than plain text to the cloud. However, recent text reconstruction techniques have demonstrated that it is possible to transform representations into original words, suggesting that privacy risk remains. In this paper, we propose TextObfuscator, a novel framework for preserving inference privacy by applying random perturbations to clustered representations. The random perturbations make each word representation indistinguishable from surrounding functionally similar representations, thus obscuring word information while retaining the original word functionality. To achieve this, we utilize prototypes to learn clustered representations, where words of similar functionality are encouraged to be closer to the same prototype during training. Additionally, we design different methods to find prototypes for token-level and sentence-level tasks, which can improve performance by incorporating semantic and task information. Experimental results on token and sentence classification tasks show that TextObfuscator achieves improvement over compared methods without increasing inference cost.

Cite

CITATION STYLE

APA

Zhou, X., Lu, Y., Ma, R., Gui, T., Wang, Y., Ding, Y., … Huang, X. (2023). TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5459–5473). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.337

TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

Abstract

Cite

Register to see more suggestions