Abstract
Goal: To develop a computationally efficient and unbiased synthetic data generator for large-scale in silico clinical trials (CTs). Methods: We propose the BGMM-OCE, an extension of the conventional BGMM (Bayesian Gaussian Mixture Models) algorithm to provide unbiased estimations regarding the optimal number of Gaussian components and yield high-quality, large-scale synthetic data at reduced computational complexity. Spectral clustering with efficient eigenvalue decomposition is applied to estimate the hyperparameters of the generator. A case study is conducted to compare the performance of BGMM-OCE against four straightforward synthetic data generators for in silico CTs in hypertrophic cardiomyopathy (HCM). Results: The BGMM-OCE generated 30000 virtual patient profiles having the lowest coefficient-of-variation (0.046), inter- and intra-correlation differences (0.017, and 0.016, respectively) with the real ones in reduced execution time. Conclusions: BGMM-OCE overcomes the lack of population size in HCM which obscures the development of targeted therapies and robust risk stratification models.
Author supplied keywords
Cite
CITATION STYLE
Pezoulas, V. C., Tachos, N. S., Gkois, G., Olivotto, I., Barlocco, F., & Fotiadis, D. I. (2022). Bayesian Inference-Based Gaussian Mixture Models with Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for in Silico Clinical Trials. IEEE Open Journal of Engineering in Medicine and Biology, 3, 108–114. https://doi.org/10.1109/OJEMB.2022.3181796
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.