AFold - Using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

8Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. Results: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. Conclusions: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets.

Cite

CITATION STYLE

APA

Yang, W., Rosenstiel, P., & Schulenburg, H. (2019). AFold - Using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data. BMC Genomics, 20(1). https://doi.org/10.1186/s12864-019-5686-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free