Motivation: Modern molecular technologies allow the collection of large amounts of high-throughput data on the functional attributes of genes. Often multiple technologies and study designs are used to address the same biological question such as which genes are overexpressed in a specific disease state. Consequently, there is considerable interest in methods that can integrate across datasets to present a unified set of predictions. Results: An important aspect of data integration is being able to account for the fact that datasets may differ in how accurately they capture the biological signal of interest. While many methods to address this problem exist, they always rely either on dataset internal statistics, which reflect data structure and not necessarily biological relevance, or external gold standards, which may not always be available. We present a new rank aggregation method for data integration that requires neither external standards nor internal statistics but relies on Bayesian reasoning to assess dataset relevance. We demonstrate that our method outperforms established techniques and significantly improves the predictive power of rank-based aggregations. We show that our method, which does not require an external gold standard, provides reliable estimates of dataset relevance and allows the same set of data to be integrated differently depending on the specific signal of interest.
CITATION STYLE
Badgeley, M. A., Sealfon, S. C., & Chikina, M. D. (2015). Hybrid Bayesian-rank integration approach improves the predictive power of genomic dataset aggregation. Bioinformatics, 31(2), 209–215. https://doi.org/10.1093/bioinformatics/btu518
Mendeley helps you to discover research relevant for your work.