Deep Learning based Privacy Information Identification approach for Unstructured Text

Yichen Ning; Na Wang; Aodi Liu; Xuehui Du

Conference ProceedingsOPEN ACCESS

Deep Learning based Privacy Information Identification approach for Unstructured Text

Journal of Physics: Conference Series (2021) 1848(1)

DOI: 10.1088/1742-6596/1848/1/012032

4Citations

5Readers

Abstract

Data sharing sometimes brings the privacy disclosure risk. Anonymization methods such as k-anonymity, l-diversity prevent privacy disclosure, but such methods are suitable for structured text. There are a lot of unstructured texts in people's lives (such as social network texts, clinical texts), and identifying and structuring the private information(PI) of unstructured texts is a problem. Based on this, we propose a deep learning-based unstructured text PI identification approach, which can extract PI in unstructured text, associate the PI with the corresponding subject, and organize it into structured data, to support follow-up anonymization. This approach is divided into two tasks: PI identification and PI association. we respectively propose a sequence labeling model based on the RoBERTa-BiLSTM-CRF hybrid neural network and a PI association method based on the RoBERTa-HCR hybrid neural network to identify PI and organize it into structured data. The experimental results show that, compared with the benchmark model, RoBEERTa-BiLSTM-CRF has better performance; compared with the current Chinese coreference resolution model, the average F1-score value of RoBERTa-HCR is increased by 6%.

Cite

CITATION STYLE

APA

Ning, Y., Wang, N., Liu, A., & Du, X. (2021). Deep Learning based Privacy Information Identification approach for Unstructured Text. In Journal of Physics: Conference Series (Vol. 1848). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1848/1/012032

Deep Learning based Privacy Information Identification approach for Unstructured Text

Abstract

Cite

Register to see more suggestions