On the utility of biobanks linked to electronic medical records in genome-wide association studies

  • L. D
  • M.D. R
  • J.C. D
  • et al.
N/ACitations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Low levels of high-density lipoprotein cholesterol (HDL-C) are predictive of cardiovascular disease and myocardial infarction. Both genetics and the environment contribute to the variability of HDL-C trait distribution in the general population. To identify the genetic variants associated with HDL-C, candidate gene and genome-wide association studies (GWAS) have been performed in epidemiologic studies drawn from general population settings. Collectively, these studies identified more than ten genes or genomic regions associated with HDL-C levels in populations of European-descent, many of which are highly replicable in subsequent studies. An alternative strategy to the well-characterized epidemiologic study is the use of large DNA repositories or biobanks linked to electronic medical records (EMRs) as a source of data suitable for genome-wide association studies for gene discovery and replication. The advantages of biobanks are severalfold, including rapid accrual of samples, multiple phenotypes and traits linked to each DNA sample, and dense pharmacologic data for drug exposure assessment. To explore this alternative strategy, the National Human Genome Research Institute's electronic Medical Records and Genomics (eMERGE) Network aims to assess the utility of EMRs coupled to DNA repositories as a tool for GWAS and other genome studies. BioVU, the Vanderbilt DNA Databank, is one such repository of DNA samples extracted from discarded blood samples collected for routine clinical testing. These DNA samples are linked to a de-identified image of the EMR called the Synthetic Derivative. To date, BioVU contains >78,000 DNA samples in the repository. As a member of eMERGE, a subset of BioVU DNA samples was genotyped on the Illumina Human660W-Quadv1-A by the Center for Genotyping and Analysis at the Broad Institute. Natural language processing algorithms were developed to select DNA samples for genotyping from patients with normal electrocardiograms without evidence of cardiac disease, as the QRS duration (a trait of the electrocardiogram) was the primary trait for analysis. Data were cleaned using the quality control pipeline developed by the eMERGE Genomics Working Group, and a total of 514,841 SNPs in 2,337 samples were available for the primary trait analysis. A subset of the genotyped samples also contained trait information on HDL-C. As a secondary analysis, single SNP tests of association for median HDL-C were performed on these 1,079 genotyped samples. In unadjusted linear regressions assuming an additive genetic model, transformed HDL-C was associated with CETP SNPs rs1532624 and rs1800775 at p-values (betas) 1.79x10-8 (0.083) and 2.62x10-8 (0.082), respectively. Two other CETP SNPs were also associated with HDL-C levels at a significance threshold of 10-7. Thus, our "top hits" in this EMR-based GWAS replicate the findings of several GWAS and candidate gene studies performed in epidemiologic cohorts for HDL-C, suggesting EMR-based DNA repositories will be useful for discovery and replication for a variety of algorithm defined phenotypes and traits linked to DNA samples.

Cite

CITATION STYLE

APA

L., D., M.D., R., J.C., D., J.M., P., M.A., B., A.H., R., … D.C., C. (2010). On the utility of biobanks linked to electronic medical records in genome-wide association studies. HUGO Journal. L. Dumitrescu, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, United States: Springer Netherlands. Retrieved from http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=emed9&NEWS=N&AN=70337761

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free