Abstract
Motivation: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. Results: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes.
Cite
CITATION STYLE
Ghazi, A. R., Sucipto, K., Rahnavard, A., Franzosa, E. A., McIver, L. J., Lloyd-Price, J., … Huttenhower, C. (2022). High-sensitivity pattern discovery in large, paired multiomic datasets. Bioinformatics, 38, I378–I385. https://doi.org/10.1093/bioinformatics/btac232
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.