Abstract
Appropriate optimization of bioinformatics workflows is vital to improve the timely discovery of variants implicated in cancer genomics. Sequenced human brain tumor data was assembled to optimize tool implementations and run various components of RNA sequence (RNA-seq) workflows. The measurable information produced by these tools account for the success rate and overall efficiency of a standardized and simultaneous analysis. We used the National Center for Biotechnology Information) Sequence Read Archive (NCBI-SRA) database to retrieve two transcriptomic datasets containing over 104 million reads as input data. We used these datasets to benchmark various file systems on the Bridges supercomputer to improve overall workflow throughput. Based on program and job timings, we report critical recommendations on selections of appropriate file systems and node types to efficiently execute these workflows.
Author supplied keywords
Cite
CITATION STYLE
Jimenez-Ruiz, I., Gonzalez-Mendez, R., & Ropelewski, A. (2017). Optimizing high performance big data cancer workflows. In ACM International Conference Proceeding Series (Vol. Part F128771). Association for Computing Machinery. https://doi.org/10.1145/3093338.3093372
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.