Identifying blocks and sub-populations in noisy SNP data

Gad Kimmel; Roded Sharan; Ron Shamir

Journal Article

Identifying blocks and sub-populations in noisy SNP data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2812 303-319

DOI: 10.1007/978-3-540-39763-2_23

7Citations

6Readers

Get full text

Abstract

We study several problems arising in haplotype block partitioning. Our objective function is the total number of distinct haplotypes in blocks. We show that the problem is NP-hard when there are errors or missing data, and provide approximation algorithms for several of its variants. We also give an algorithm that solves the problem with high probability under a probabilistic model that allows noise and missing data. In addition, we study the multi-population case, where one has to partition the haplotypes into populations and seek a different block partition in each one. We provide a heuristic for that problem and use it to analyze simulated and real data. On simulated data, our blocks resemble the true partition more than the blocks generated by the LD-based algorithm of Gabriel et al. [7]. On single-population real data, we generate a more concise block description than extant approaches, with better average LD within blocks. The algorithm also gives promising results on real 2-population genotype data. © Springer-Verlag Berlin Heidelberg 2003.

Author supplied keywords

Cite

CITATION STYLE

APA

Kimmel, G., Sharan, R., & Shamir, R. (2003). Identifying blocks and sub-populations in noisy SNP data. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2812, 303–319. https://doi.org/10.1007/978-3-540-39763-2_23

Identifying blocks and sub-populations in noisy SNP data

Abstract

Author supplied keywords

Cite

Register to see more suggestions