A robust clustering algorithm for identifying problematic samples in genome-wide association studies

52Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. © The Author(s) 2011. Published by Oxford University Press. All rights reserved.

Cite

CITATION STYLE

APA

Bellenguez, C., Strange, A., Freeman, C., Donnelly, P., & Spencer, C. C. A. (2012). A robust clustering algorithm for identifying problematic samples in genome-wide association studies. Bioinformatics, 28(1), 134–135. https://doi.org/10.1093/bioinformatics/btr599

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free