A general approach is presented for the extraction of a classifier of disease risk that is latent in large scale disease/control databases. Novel features are the following: (1) a data reorganization into a regularized standard form that emphasizes individual alleles instead of the single nucleotide polymorphism (Snp) allele pair to which they belong; (2) from this a procedure that significantly enhances the discovery of high value genomic loci; (3) an investigative analysis based on the hypothesis that disease represents a very small signal (small signal-to-noise) that is latent in the data. The resulting analyses applied to the FUSION T2D database leads to the polling of thousands of genomic loci to classify disease. This large genomic kernel of loci is shared by non-diabetics at nearly the same high level; but a small well defined separation exists and it is speculated that this might be due to unconventional disease mechanisms. Another analysis demonstrates that the FUSION database size limits its disease predictability, and only one third of the resulting classifier loci are estimated to relate to T2D. The remainder is associated with hidden features that might contrast the disease and control populations and that more data would eliminate. © 2014 Lawrence Sirovich.
CITATION STYLE
Sirovich, L. (2014). Genomic data and disease forecasting: Application to type 2 diabetes (T2D). PLoS ONE, 9(1). https://doi.org/10.1371/journal.pone.0085684
Mendeley helps you to discover research relevant for your work.