The colored de Bruijn graph, an extension of the de Bruijn graph, is routinely applied for variant calling, genotyping, genome assembly, and various other applications [11]. In this data structure, the edges are labeled with one or more colors from a set {c1 … cα}, and are stored as a m×α matrix, where m is the number of edges. Recently, there has been a significant amount of work in developing compacted representations of this color matrix but all existing methods have focused on compressing the color matrix [3, 10, 12, 14]. In this paper, we explore the problem of recoloring the graph in order to reduce the number of colors, and thus, decrease the size of the color matrix. We show that finding the minimum number of colors needed for recoloring is not only NP-hard but also, difficult to approximate within a reasonable factor. These hardness results motivate the need for a recoloring heuristic that we present in this paper. Our results show that this heuristic is able to reduce the number of colors between one and two orders of magnitude. More specifically, when the number of colors is large (>5,000,000) the number of colors is reduced by a factor of 136 by our heuristic. An implementation of this heuristic is publicly available at https://github.com/baharpan/cosmo/tree/Recoloring.
CITATION STYLE
Alipanahi, B., Kuhnle, A., & Boucher, C. (2018). Recoloring the colored de Bruijn graph. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11147 LNCS, pp. 1–11). Springer Verlag. https://doi.org/10.1007/978-3-030-00479-8_1
Mendeley helps you to discover research relevant for your work.