Tag SNP selection based on multivariate linear regression

13Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has been recently received great attention. For these studies, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs. Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs and compaction of extremely long SNP sequences (obtained, e.g., from Affimetrix Map Array) for further fine genotype analysis. Tagging should first choose tags from the SNPs under consideration and then knowing the values of chosen tag SNPs predict (or statistically cover) the non-tag SNPs. In this paper we propose a new SNP prediction method based on rounding of multivariate linear regression (MLR) analysis in sigma-restricted coding. When predicting a non-tag SNP, the MLR method accumulates information about all tag SNPs resulting in significantly higher prediction accuracy with the same number of tags than for the previously known tagging methods. We also show that the tag selection strongly depends on how the chosen tags will be used - advantage of one tag set over another can only be considered with respect to a certain prediction method. Two simple universal tag selection methods have been applied: a (faster) stepwise and a (slower) local-minimization tag selection algorithms. An extensive experimental study on various datasets including 6 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses significantly fewer tags (e.g., up to two times less tags to reach 90% prediction accuracy) than the state-of-art methods of Halperin et al. [8] for genotypes and Halldorsson et al. [7] for haplotypes, respectively. Our stepwise tagging matches the quality of while being faster than STAMPA [8]. The code is publicly available at http://alla.cs.gsu.edu/~software. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

He, J., & Zelikovsky, A. (2006). Tag SNP selection based on multivariate linear regression. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3992 LNCS-II, pp. 750–757). Springer Verlag. https://doi.org/10.1007/11758525_101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free