Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data

Syed Mahir Tazwar; Max Knobbout; Enrique Hortal Quesada; Mirela Popa

Conference ProceedingsOPEN ACCESS

Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data

International Conference on Pattern Recognition Applications and Methods (2024) 1 17-26

DOI: 10.5220/0012302400003654

2Citations

13Readers

Get full text

Abstract

Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-theart models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperforms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.

Author supplied keywords

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Tazwar, S. M., Knobbout, M., Quesada, E. H., & Popa, M. (2024). Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data. In International Conference on Pattern Recognition Applications and Methods (Vol. 1, pp. 17–26). Science and Technology Publications, Lda. https://doi.org/10.5220/0012302400003654

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 3

50%

Researcher 2

33%

Lecturer / Post doc 1

17%

Readers' Discipline

Computer Science 5

100%

Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data

Abstract

Author supplied keywords

Cited by Powered by Scopus

Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation

Comprehensive Review of Privacy, Utility, and Fairness Offered by Synthetic Data

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline