Machine Learning Model Generation With Copula-Based Synthetic Dataset for Local Differentially Private Numerical Data

12Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

With the development of IoT technology, personal data are being collected in many places. These data can be used to create new services, but consideration must be given to the individual's privacy. We can safely collect personal data while adding noise by applying differential privacy. However, because such data are very noisy, the accuracy of machine learning trained by the data greatly decreased. In this study, our objective is to build a highly accurate machine learning model using these data. We focus on the decision tree machine learning algorithm, and, instead of applying it as is, we use a preprocessing technique wherein pseudodata are generated using a copula while removing the effect of noise added by differential privacy. In detail, the proposed novel protocol consists of three steps: generating a covariance matrix from the differentially private numerical data, generating a discrete cumulative distribution function from differentially private numerical data, and generating copula-based numerical samples. Simulation results using synthetic and real datasets verify the utility of the proposed method not only for the decision tree algorithm but also for other machine learning algorithms such as deep neural networks. This method will help create machine learning models, such as recommendation systems, using differential privacy data.

Cite

CITATION STYLE

APA

Sei, Y., Andrew Onesimu, J., & Ohsuga, A. (2022). Machine Learning Model Generation With Copula-Based Synthetic Dataset for Local Differentially Private Numerical Data. IEEE Access, 10, 101656–101671. https://doi.org/10.1109/ACCESS.2022.3208715

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free