LightAssembler: Fast and memory-efficient assembly algorithm for high-throughput sequencing reads

13Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. Results: LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of g-spaced sequenced k-mers and the other holding k-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by 50% compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage.

Cite

CITATION STYLE

APA

El-Metwally, S., Zakaria, M., & Hamza, T. (2016). LightAssembler: Fast and memory-efficient assembly algorithm for high-throughput sequencing reads. Bioinformatics, 32(21), 3215–3223. https://doi.org/10.1093/bioinformatics/btw470

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free