Evaluating the coverage and potential of imputing the exome microarray with next-generation imputation using the 1000 genomes project

1Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Next-generation genotyping microarrays have been designed with insights from large-scale sequencing of exomes and whole genomes. The exome genotyping arrays promise to query the functional regions of the human genome at a fraction of the sequencing cost, thus allowing large number of samples to be genotyped. However, two pertinent questions exist: firstly, how representative is the content of the exome chip for populations not involved in the design of the chip; secondly, can the content of the exome chip be imputed with the reference data from the 1000 Genomes Project (1KGP). By deep whole-genome sequencing two Asian populations that are not part of the 1KGP, comprising 96 Southeast Asian Malays and 36 South Asian Indians for which the same samples have also been genotyped on both the Illumina 2.5 M and exome microarrays, we discovered the exome chip is a poor representation of exonic content in our two populations. However, up to 94.1% of the variants on the exome chip that are polymorphic in our populations can be confidently imputed with existing non-exome-centric microarrays using the 1KGP panel. The coverage further increases if there exists populationspecific reference data from whole-genome sequencing. There is thus limited gain in using the exome chip for populations not involved in the microarray design. Instead, for the same cost of genotyping 2,000 samples on the exome chip, performing whole-genome sequencing of at least 35 samples in that population to complement the 1KGP may yield a higher coverage of the exonic content from imputation instead.

Figures

  • Figure 1. (A) The proportion of monomorphic and polymorphic exonic variants in the Illumina exome chip when assessed in each of the three Singapore populations. The exonic variants on the exome chip are further categorized according to whether they are present in any of the reference panels from the 1000 Genomes Project or the Singapore Sequencing Study for the Malays and Indians (‘‘Covered’’) and can in theory be imputed, or not present in any of the existing reference panels and thus cannot be recovered through imputation (‘‘Not covered’’). (B) Distribution of SNPs on the exome chip according to the minor allele frequencies (MAFs) into monomorphic (MAF = 0%), rare (0%, MAF #1%), low-frequency (1%, MAF #5%) and common (MAF .5%) in each of the three populations. (C) MAF categorization of the polymorphic exome chip SNPs in each of the three populations according to whether these SNPs are present (non-purple bars) or not (purple bars) in the respective reference panels. Numbers in brackets indicate the number of SNPs in the respective categories. doi:10.1371/journal.pone.0106681.g001
  • Figure 2. The percentage of polymorphic exome chip SNPs in each of the three populations that can be reliably imputed against three different reference panels using the SNPs on the Illumina HumanOmni2.5 as input. Each of these SNPs is categorized according to the minor allele frequency (MAF) as rare (0%, MAF #1%), low-frequency (1%, MAF #5%) and common (MAF .5%). See Figures S1 and S2 in the Supplementary Material for the equivalent figures when SNPs on the HumanHap550 and Human1M are used as input respectively. The total number of imputed exome SNPs when using Illumina HumanOmni2.5/HumanHap550/Human1M as the study panel is shown in Table S4, S5 and S6. doi:10.1371/journal.pone.0106681.g002
  • Table 1. Discordance (%) between imputed genotypes and actually observed genotypes at SNPs on Omni2.5 but not in the exome chip.
  • Table 2. Discordance (%) between imputed genotypes and actually observed minor allele genotypes1 at rare and low-frequency SNPs on the exome chip but not in the Omni2.5.
  • Table 3. Actual and recoverable content of exonic variants in 96 Malays (SSMP) and 36 Indians (SSIP) based on HumanOmni2.5 as the study panel.

References Powered by Scopus

PLINK: A tool set for whole-genome association and population-based linkage analyses

24431Citations
N/AReaders
Get full text

Finding the missing heritability of complex diseases

6469Citations
N/AReaders
Get full text

A map of human genome variation from population-scale sequencing

6399Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Association analysis of exome variants and refraction, axial length, and corneal curvature in a European–American population

3Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tantoso, E., Wong, L. P., Li, B., Saw, W. Y., Xu, W., Little, P., … Teo, Y. Y. (2014). Evaluating the coverage and potential of imputing the exome microarray with next-generation imputation using the 1000 genomes project. PLoS ONE, 9(9). https://doi.org/10.1371/journal.pone.0106681

Readers over time

‘14‘15‘16‘17‘18‘19‘20‘21‘22‘2302468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

65%

Researcher 5

29%

Professor / Associate Prof. 1

6%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 6

38%

Biochemistry, Genetics and Molecular Bi... 6

38%

Computer Science 2

13%

Medicine and Dentistry 2

13%

Save time finding and organizing research with Mendeley

Sign up for free
0