Improved parallel processing of massive De Bruijn graph for genome assembly

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

De Bruijn graph is a vastly used technique for developing genome assembly software nowadays. The scale of this kind of graph can reach billions of vertices and edges which poses great challenges to the genome assembly task. It is of great importance to study scalable genome assembly algorithms in order to cope with this situation. Despite some recent works which begin to address the scalability problem with parallel assembly algorithms, massive De Bruijn graph processing is still very time consuming which needs optimized operations. In this paper, we aim to significantly improve the efficiency of massive De Bruijn graph processing. Specifically, the time consuming and memory intensive processing are the De Bruijn graph construction phase and the simplification phase. We observe that the existing list ranking approach repeatedly performs parallel global sorting over all De Bruijn graph vertices, which results in a huge amount of communications between computing nodes. Therefore, we propose to use depth-first traversal over the underlying De Bruijn graph once to achieve the same objective as the existing list ranking approach. The new method is fast, effective and can be executed in parallel. It has a computing complexity of O(g/p) and communication complexity of O(g), which is smaller than the existing list ranking approach, here g is the length of genome reference, p is the number of processors. Our experimental results using error-free data show that, when the number of processors scales from 8 to 128, our algorithm has a speedup of 10 times on processing simulated data of Yeast and C.elegans. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Zeng, L., Cheng, J., Meng, J., Wang, B., & Feng, S. (2013). Improved parallel processing of massive De Bruijn graph for genome assembly. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7808 LNCS, pp. 96–107). https://doi.org/10.1007/978-3-642-37401-2_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free