Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Most tasks in NLP require labeled data. Data labeling is often done on crowdsourcing platforms due to scalability reasons. However, publishing data on public platforms can only be done if no privacy-relevant information is included. Textual data often contains sensitive information like person names or locations. In this work, we investigate how removing personally identifiable information (PII) as well as applying differential privacy (DP) rewriting can enable text with privacy-relevant information to be used for crowdsourcing. We find that DP-rewriting before crowdsourcing can preserve privacy while still leading to good label quality for certain tasks and data. PII-removal led to good label quality in all examined tasks, however, there are no privacy guarantees given.

Cite

CITATION STYLE

APA

Mouhammad, N., Daxenberger, J., Schiller, B., & Habernal, I. (2023). Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 73–84). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.law-1.8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free