Although distributed learning has increasingly gained attention in terms of effectively utilizing local devices for data privacy enhancement, recent studies show that publicly shared gradients in the training process can reveal the private training data (gradient leakage) to a third party. However, so far there hasn't been any systematic study of the gradient leakage mechanism of the Transformer based language models. In this paper, as the first attempt, we formulate the gradient attack problem on the Transformer-based language models and propose a gradient attack algorithm, TAG, to recover the local training data. Experimental results on Transformer, TinyBERT4, TinyBERT6, BERTBASE, and BERTLARGE using GLUE benchmark show that compared with DLG (Zhu et al., 2019), TAG works well on more weight distributions in recovering private training data and achieves 1.5× Recover Rate and 2.5× ROUGE-2 over prior methods without the need of ground truth label. TAG can obtain up to 88.9% tokens and up to 0.93 cosine similarity in token embeddings from private training data by attacking gradients on CoLA dataset. In addition, TAG is stronger than previous approaches on larger models, smaller dictionary size, and smaller input length.
CITATION STYLE
Deng, J., Wan, Y., Li, J., Wang, C., Shang, C., Liu, H., … Ding, C. (2021). TAG: Gradient Attack on Transformer-based Language Models. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 3600–3610). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.305
Mendeley helps you to discover research relevant for your work.