Blind normalization of public high-throughput databases

Sebastian Ohse; Melanie Boerries; Hauke Busch

Journal ArticleOPEN ACCESS

Blind normalization of public high-throughput databases

PeerJ Computer Science (2019) 2019 1-16

DOI: 10.7717/PEERJ-CS.231

1Citations

7Readers

Abstract

The rise of high-throughput technologies in the domain of molecular and cell biology, as well as medicine, has generated an unprecedented amount of quantitative highdimensional data. Public databases at present make a wealth of this data available, but appropriate normalization is critical for meaningful analyses integrating different experiments and technologies. Without such normalization, meta-analyses can be difficult to perform and the potential to address shortcomings in experimental designs, such as inadequate replicates or controls with public data, is limited. Because of a lack of quantitative standards and insufficient annotation, large scale normalization across entire databases is currently limited to approaches that demand ad hoc assumptions about noise sources and the biological signal. By leveraging detectable redundancies in public databases, such as related samples and features, we show that blind normalization without constraints on noise sources and the biological signal is possible. The inherent recovery of confounding factors is formulated in the theoretical framework of compressed sensing and employs efficient optimization on manifolds. As public databases increase in size and offer more detectable redundancies, the proposed approach is able to scale to more complex confounding factors. In addition, the approach accounts for missing values and can incorporate spike-in controls. Our work presents a systematic approach to the blind normalization of public high-throughput databases.

Author supplied keywords

Cite

CITATION STYLE

APA

Ohse, S., Boerries, M., & Busch, H. (2019). Blind normalization of public high-throughput databases. PeerJ Computer Science, 2019, 1–16. https://doi.org/10.7717/PEERJ-CS.231

Blind normalization of public high-throughput databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions