On rare variants in principal component analysis of population stratification

15Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Population stratification is a known confounder of genome-wide association studies, as it can lead to false positive results. Principal component analysis (PCA) method is widely applied in the analysis of population structure with common variants. However, it is still unclear about the analysis performance when rare variants are used. Results: We derive a mathematical expectation of the genetic relationship matrix. Variance and covariance elements of the expected matrix depend explicitly on allele frequencies of the genetic markers used in the PCA analysis. We show that inter-population variance is solely contained in K principal components (PCs) and mostly in the largest K-1 PCs, where K is the number of populations in the samples. We propose FPC, ratio of the inter-population variance to the intra-population variance in the K population informative PCs, and d 2, sum of squared distances among populations, as measures of population divergence. We show analytically that when allele frequencies become small, the ratio FPC abates, the population distance d 2 decreases, and portion of variance explained by the K PCs diminishes. The results are validated in the analysis of the 1000 Genomes Project data. The ratio FPC is 93.85, population distance d 2 is 444.38, and variance explained by the largest five PCs is 17.09% when using with common variants with allele frequencies between 0.4 and 0.5. However, the ratio, distance and percentage decrease to 1.83, 17.83 and 0.74%, respectively, with rare variants of frequencies between 0.0001 and 0.01. Conclusions: The PCA of population stratification performs worse with rare variants than with common ones. It is necessary to restrict the selection to only the common variants when analyzing population stratification with sequencing data.

Cite

CITATION STYLE

APA

Ma, S., & Shi, G. (2020). On rare variants in principal component analysis of population stratification. BMC Genetics, 21(1). https://doi.org/10.1186/s12863-020-0833-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free