Effects of input data quantity on genome-wide association studies (GWAS)

5Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Many software packages have been developed for Genome-Wide Association Studies (GWAS) based on various statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. In this paper, we investigate how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM, in the context of plant genomes and phenotypes. Both synthetic and real data are used. Evaluation is based on p- and q-values of output SNPs, and Kendall rank correlation between output SNP lists. Results show that for the same GWAS program, different Arabidopsis thaliana datasets demonstrate similar trends of rank correlation with varied input quantity, but differentiate on the numbers of SNPs passing a given p- or q-value threshold. We also show that variations in numbers of replicates influence the p-values of SNPs, but do not strongly affect the rank correlation.

Cite

CITATION STYLE

APA

Yan, Y. Y., Burbridge, C., Shi, J., Liu, J., & Kusalik, A. (2019). Effects of input data quantity on genome-wide association studies (GWAS). In International Journal of Data Mining and Bioinformatics (Vol. 22, pp. 19–43). Inderscience Publishers. https://doi.org/10.1504/IJDMB.2019.099286

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free