A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources

  • Morgan J
  • Chapman R
  • Anderson P
N/ACitations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The development of Next-Generation Sequencing, or NGS, has heightened the throughput of traditional sequencing by using RNA reads to work backward, assembling a transcriptome from expressed fragments. These methods are designed to lower the cost of DNA sequencing by analyzing vast quantities of RNA fragments but have led to computational and storage obstacles. RNA sequencing produces large files that can be in excess of 20GB, making them unwieldy to the general researcher. Vast data resources have been created to extract the information contained in these massively parallel data. These resources and their associated tools are heterogeneous and highly distributed. This requires the scientist to create and execute highly complex customized analyses that involve gathering and organizing data from heterogeneous sources while interfacing with a variety of software tools. This is often infeasible without the support of computer specialists and significant hardware upgrades.The objective of this work is to combine the strengths of the scientific workflow project, Galaxy, with traditional high performance computing resources and affordable cloud-based data storage that encourages collaboration. The specific goals are to create a Galaxy tool that executes Trinity on a remote cluster and to develop software to download, upload, and manage NGS results to and from the cloud. Trinity is a computationally expensive de novo assembler that processes large RNA files to produce a FASTA transcriptome. Our system facilitates broad-based collaboration and distribution by building around the Google Drive cloud storage solution, where the processing of RNA sequences can be both shared and analyzed with a single upload and be reused for multiple purposes. The system is also able to remotely execute Trinity and other NGS tools on traditional high performance computing infrastructure that is often available as a shared resource at universities. Specifically, we develop a Trinity-based workflow executed in a heterogeneous environment with batch HPC resources that produce the abundances' of RNA fragments to discover patterns of expression.To demonstrate our workflow system with cloud-based storage and sharing, we will analyze duplicate samples of ovarian biopsies that have been generated by 72 base, paired-end sequencing (RNAseq) performed on an Illumina Genome Analyzer IIX. Sequencing was performed utilizing a balanced block design with pooled barcoded samples from 8 fish run in each lane and duplicate lanes employed as sequencing technical replicates. As the Illumina-based RNAseq approach exhibits much higher sensitivity for low abundance transcripts and far deeper sequencing coverage than the Roche 454 pyro-sequencing originally utilized, we expect that thousands of new ovarian gene transcripts will be revealed by this approach after RNAseq reads are quality filtered, assembled de novo into contigs and compared to the existing striped bass ovarian transcriptome, and especially to the tens of thousands of singleton sequences that we obtained but have not previously verified or published. This process will yield a far more comprehensive ovarian transciptome that will represent the overwhelming majority of genes expressed in the ovary of striped bass during oocyte growth and maturation.

Cite

CITATION STYLE

APA

Morgan, J. C., Chapman, R. W., & Anderson, P. E. (2012). A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources (pp. 594–594). Association for Computing Machinery (ACM). https://doi.org/10.1145/2382936.2383033

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free