Recoloring the colored de Bruijn graph

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The colored de Bruijn graph, an extension of the de Bruijn graph, is routinely applied for variant calling, genotyping, genome assembly, and various other applications [11]. In this data structure, the edges are labeled with one or more colors from a set {c1 … cα}, and are stored as a m×α matrix, where m is the number of edges. Recently, there has been a significant amount of work in developing compacted representations of this color matrix but all existing methods have focused on compressing the color matrix [3, 10, 12, 14]. In this paper, we explore the problem of recoloring the graph in order to reduce the number of colors, and thus, decrease the size of the color matrix. We show that finding the minimum number of colors needed for recoloring is not only NP-hard but also, difficult to approximate within a reasonable factor. These hardness results motivate the need for a recoloring heuristic that we present in this paper. Our results show that this heuristic is able to reduce the number of colors between one and two orders of magnitude. More specifically, when the number of colors is large (>5,000,000) the number of colors is reduced by a factor of 136 by our heuristic. An implementation of this heuristic is publicly available at https://github.com/baharpan/cosmo/tree/Recoloring.

Cite

CITATION STYLE

APA

Alipanahi, B., Kuhnle, A., & Boucher, C. (2018). Recoloring the colored de Bruijn graph. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11147 LNCS, pp. 1–11). Springer Verlag. https://doi.org/10.1007/978-3-030-00479-8_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free