Abstract
Featured Application: This study’s findings hold significant implications for enhancing data privacy and utility in healthcare analytics. By evaluating synthetic data generation methods like CTGAN, TVAE, CopulaGAN and Copula across diverse medical datasets containing sensitive patient information, such as genetic profiles and medical histories, the research aims to improve the development of predictive models without compromising patient privacy. The generation of synthetic data holds significant promise for augmenting limited datasets while avoiding privacy issues, facilitating research, and enhancing machine learning models’ robustness. Generative Adversarial Networks (GANs) stand out as promising tools, employing two neural networks—generator and discriminator—to produce synthetic data that mirrors real data distributions. This study evaluates GAN variants (CTGAN, CopulaGAN), a variational autoencoder, and copulas on diverse real datasets of different complexity encompassing numerical and categorical attributes. The results highlight CTGAN’s sensitivity to training parameters and TVAE’s robustness across datasets. Scalability challenges persist, with GANs demanding substantial computational resources. TVAE stands out for its high utility across all datasets, even for high-dimensional data, though it incurs higher privacy risks, which is indicative of the curse of dimensionality. While no single model universally excels, understanding the trade-offs and leveraging model strengths can significantly enhance synthetic data generation (SDG). Future research should focus on adaptive learning mechanisms, scalability enhancements, and standardized evaluation metrics to advance SDG methods effectively. Addressing these challenges will foster broader adoption and application of synthetic data.
Author supplied keywords
Cite
CITATION STYLE
Miletic, M., & Sariyar, M. (2024). Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Applied Sciences (Switzerland), 14(14). https://doi.org/10.3390/app14145975
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.