How to do quantile normalization correctly for gene expression data analyses

77Citations
Citations of this article
194Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.

Cite

CITATION STYLE

APA

Zhao, Y., Wong, L., & Goh, W. W. B. (2020). How to do quantile normalization correctly for gene expression data analyses. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-72664-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free