A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Timo Beller; Enno Ohlebusch

Journal ArticleOPEN ACCESS

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Algorithms for Molecular Biology (2016) 11(1)

DOI: 10.1186/s13015-016-0083-7

14Citations

42Readers

Abstract

Background: Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an O(n log g) time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497-504, 2016) improved their result. Results: In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele-a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.

Author supplied keywords

Cite

CITATION STYLE

APA

Beller, T., & Ohlebusch, E. (2016). A representation of a compressed de Bruijn graph for pan-genome analysis that enables search. Algorithms for Molecular Biology, 11(1). https://doi.org/10.1186/s13015-016-0083-7

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Abstract

Author supplied keywords

Cite

Register to see more suggestions