#5490 GENERATIVE ARTIFICIAL INTELLIGENCE FOR CREATION OF SYNTHETIC HYPERTENSION TRIAL DATA

  • Jain C
  • Judge C
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background and Aims: Synthetic data can be an effective supplement or alternative to real data for the training of machine learning models. Synthetic data may also be used to evaluate new tools, develop educational curricula, or remove undesirable biases in datasets. We aim to evaluate four synthetic data generation methods applied to hypertension randomized clinical trial data. Method: The Systolic Blood Pressure Intervention Trial (SPRINT) trial showed that intensive BP control to SBP <120 mm Hg results in significant cardiovascular benefits in high-risk patients with hypertension compared with routine BP control to <140 mm Hg. The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily generate new Synthetic Data that has the same format and statistical properties as the original dataset. SDV supports multiple types of data, including date-times, discrete-ordinal, categorical, and numerical. SPRINT data was pre-processed to create a single table of 140,000 patient visits with baseline variables (age, sex, race, aspirin use, estimated Glomerular Filtration Rate (eGFR)) and visit level variables (systolic and diastolic blood pressure, heart rate and total number of antihypertensive medications at end of visit). Using the SDV library for python, we used four generative models to create synthetic SPRINT data, 1. Gaussian copula model, 2. Conditional Tabular Generative adversarial network (CTGAN), 3. CopulaGan model, and 4. Tabular Variational Auto-encode (TVAE). We evaluated the results using the SDMetrics library which includes the shapes of the columns (marginal distributions), the pairwise trends between the columns (correlations), reproduce mathematical properties from your original data and new row synthesis. Finally, an overall quality score which represents an amalgamation of the marginal distribution and correlations was computed, where 0 indicates the lowest quality and 1 indicates the highest. Results: Two hundred thousand synthetic patient visits were created for each method. The overall quality scores in order were 90.67% for Gaussian copula, 86.77% for TVAE, 81.03% for CTGAN', and 79.7% for CopulaGAN. The column shape score which represents the marginal distribution was highest for Gaussian Copula (94.54%), followed by TVAE (88.44%), CTGAN (82.35%), and Copula GAN (80.27%). The column pair trend which corresponds to correlations was highest for Gaussian Copula (86.8%), followed by TAVE (85.1%), CTGAN (79.72%), and Copula GAN (79.12%). Conclusion: Gaussian copula created the highest scoring synthetic SPRINT data based on the marginal distribution, correlations, and overall score. The Synthetic Data Vault is a feasible collection of methods for generation of synthetic clinical trial data for training future machine learning and AI models.

Cite

CITATION STYLE

APA

Jain, C., & Judge, C. (2023). #5490 GENERATIVE ARTIFICIAL INTELLIGENCE FOR CREATION OF SYNTHETIC HYPERTENSION TRIAL DATA. Nephrology Dialysis Transplantation, 38(Supplement_1). https://doi.org/10.1093/ndt/gfad063c_5490

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free