Correction for population stratification in random forest analysis

36Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Population structure (PS), including population stratification and admixture, is a significant confounder in genome-wide association studies (GWAS), as it may produce spurious associations. Random forest (RF) has been increasingly applied in GWAS data analysis because of its advantage in analysing high dimensional genetic data. RF creates importance measures for single nucleotide polymorphisms (SNPs), which are helpful for feature selections. However, if PS is not appropriately corrected, RF tends to give high importance to disease-unrelated SNPs with different frequencies of allele or genotype among subpopulations, leading to inaccurate results. Methods: In this study, the authors propose to correct for the confounding effect of PS by including the information of PS in RF analysis. The correction procedure starts by extracting the information of PS using EIGENSTRAT or multi-dimensional scaling clustering procedure from a large number of structure inference SNPs. Phenotype and genotypes adjusted by the information of PS are then used as the outcome and predictors in RF analysis. Results: Extensive simulations indicate that the importance measure of the causal SNP is increased following the PS correction. By analysing a real dataset, the proposed correction removes the spurious association between the lactase gene and height. Conclusion: The authors propose a simple method to correct for PS in RF analysis on GWAS data. Further studies in real GWAS datasets are required to validate the robustness of the proposed approach. Published by Oxford University Press on behalf of the International Epidemiological Association. © The Author 2012; all rights reserved.

Cite

CITATION STYLE

APA

Zhao, Y., Chen, F., Zhai, R., Lin, X., Wang, Z., Su, L., & Christiani, D. C. (2012). Correction for population stratification in random forest analysis. International Journal of Epidemiology, 41(6), 1798–1806. https://doi.org/10.1093/ije/dys183

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free