clubber: removing the bioinformatics bottleneck in big data analyses

Maximilian Miller; Chengsheng Zhu; Yana Bromberg

Journal ArticleOPEN ACCESS

clubber: removing the bioinformatics bottleneck in big data analyses

Journal of integrative bioinformatics (2017) 14(2)

DOI: 10.1515/jib-2017-0020

2Citations

26Readers

Abstract

With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these "big data" analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber's goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.

Author supplied keywords

Cite

CITATION STYLE

APA

Miller, M., Zhu, C., & Bromberg, Y. (2017). clubber: removing the bioinformatics bottleneck in big data analyses. Journal of Integrative Bioinformatics, 14(2). https://doi.org/10.1515/jib-2017-0020

clubber: removing the bioinformatics bottleneck in big data analyses

Abstract

Author supplied keywords

Cite

Register to see more suggestions