Deep Learning based Privacy Information Identification approach for Unstructured Text

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Data sharing sometimes brings the privacy disclosure risk. Anonymization methods such as k-anonymity, l-diversity prevent privacy disclosure, but such methods are suitable for structured text. There are a lot of unstructured texts in people's lives (such as social network texts, clinical texts), and identifying and structuring the private information(PI) of unstructured texts is a problem. Based on this, we propose a deep learning-based unstructured text PI identification approach, which can extract PI in unstructured text, associate the PI with the corresponding subject, and organize it into structured data, to support follow-up anonymization. This approach is divided into two tasks: PI identification and PI association. we respectively propose a sequence labeling model based on the RoBERTa-BiLSTM-CRF hybrid neural network and a PI association method based on the RoBERTa-HCR hybrid neural network to identify PI and organize it into structured data. The experimental results show that, compared with the benchmark model, RoBEERTa-BiLSTM-CRF has better performance; compared with the current Chinese coreference resolution model, the average F1-score value of RoBERTa-HCR is increased by 6%.

Cite

CITATION STYLE

APA

Ning, Y., Wang, N., Liu, A., & Du, X. (2021). Deep Learning based Privacy Information Identification approach for Unstructured Text. In Journal of Physics: Conference Series (Vol. 1848). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1848/1/012032

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free