Optimizing high performance big data cancer workflows

Ivan Jimenez-Ruiz; Ricardo Gonzalez-Mendez; Alexander Ropelewski

Conference Proceedings

Optimizing high performance big data cancer workflows

ACM International Conference Proceeding Series (2017) Part F128771

DOI: 10.1145/3093338.3093372

0Citations

20Readers

Get full text

Abstract

Appropriate optimization of bioinformatics workflows is vital to improve the timely discovery of variants implicated in cancer genomics. Sequenced human brain tumor data was assembled to optimize tool implementations and run various components of RNA sequence (RNA-seq) workflows. The measurable information produced by these tools account for the success rate and overall efficiency of a standardized and simultaneous analysis. We used the National Center for Biotechnology Information) Sequence Read Archive (NCBI-SRA) database to retrieve two transcriptomic datasets containing over 104 million reads as input data. We used these datasets to benchmark various file systems on the Bridges supercomputer to improve overall workflow throughput. Based on program and job timings, we report critical recommendations on selections of appropriate file systems and node types to efficiently execute these workflows.

Author supplied keywords

Cite

CITATION STYLE

APA

Jimenez-Ruiz, I., Gonzalez-Mendez, R., & Ropelewski, A. (2017). Optimizing high performance big data cancer workflows. In ACM International Conference Proceeding Series (Vol. Part F128771). Association for Computing Machinery. https://doi.org/10.1145/3093338.3093372

Optimizing high performance big data cancer workflows

Abstract

Author supplied keywords

Cite

Register to see more suggestions