Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-theart models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperforms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.
CITATION STYLE
Tazwar, S. M., Knobbout, M., Quesada, E. H., & Popa, M. (2024). Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data. In International Conference on Pattern Recognition Applications and Methods (Vol. 1, pp. 17–26). Science and Technology Publications, Lda. https://doi.org/10.5220/0012302400003654
Mendeley helps you to discover research relevant for your work.