Optimizing high performance big data cancer workflows

0Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Appropriate optimization of bioinformatics workflows is vital to improve the timely discovery of variants implicated in cancer genomics. Sequenced human brain tumor data was assembled to optimize tool implementations and run various components of RNA sequence (RNA-seq) workflows. The measurable information produced by these tools account for the success rate and overall efficiency of a standardized and simultaneous analysis. We used the National Center for Biotechnology Information) Sequence Read Archive (NCBI-SRA) database to retrieve two transcriptomic datasets containing over 104 million reads as input data. We used these datasets to benchmark various file systems on the Bridges supercomputer to improve overall workflow throughput. Based on program and job timings, we report critical recommendations on selections of appropriate file systems and node types to efficiently execute these workflows.

Cite

CITATION STYLE

APA

Jimenez-Ruiz, I., Gonzalez-Mendez, R., & Ropelewski, A. (2017). Optimizing high performance big data cancer workflows. In ACM International Conference Proceeding Series (Vol. Part F128771). Association for Computing Machinery. https://doi.org/10.1145/3093338.3093372

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free