A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies

38Citations
Citations of this article
101Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Genome projects now generate large-scale data often produced at various time points by different laboratories using multiple platforms. This increases the potential for batch effects. Currently there are several batch evaluation methods like principal component analysis (PCA; mostly based on visual inspection), and sometimes they fail to reveal all of the underlying batch effects. These methods can also lead to the risk of unintentionally correcting biologically interesting factors attributed to batch effects. Here we propose a novel statistical method, finding batch effect (findBATCH), to evaluate batch effect based on probabilistic principal component and covariates analysis (PPCCA). The same framework also provides a new approach to batch correction, correcting batch effect (correctBATCH), which we have shown to be a better approach to traditional PCA-based correction. We demonstrate the utility of these methods using two different examples (breast and colorectal cancers) by merging gene expression data from different studies after diagnosing and correcting for batch effects and retaining the biological effects. These methods, along with conventional visual inspection-based PCA, are available as a part of an R package exploring batch effect (exploBATCH; https://github.com/syspremed/exploBATCH).

Cite

CITATION STYLE

APA

Nyamundanda, G., Poudel, P., Patil, Y., & Sadanandam, A. (2017). A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies. Scientific Reports, 7(1). https://doi.org/10.1038/s41598-017-11110-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free