High throughput sequencing technologies have become fast and cheap in the past years. As a result, large-scale projects started to sequence tens to several thousands of genomes per species, producing a high number of sequences sampled from each genome. Such a highly redundant collection of very similar sequences is called a pan-genome. It can be transformed into a set of sequences “colored” by the genomes to which they belong. A colored de-Bruijn graph (C-DBG) extracts from the sequences all colored k-mers, strings of length k, and stores them in vertices. In this paper, we present an alignment-free, reference-free and incremental data structure for storing a pan-genome as a C-DBG: the Bloom Filter Trie. The data structure allows to store and compress a set of colored k-mers, and also to efficiently traverse the graph. Experimental results prove better performance compared to another state-of-the-art data structure.
CITATION STYLE
Holley, G., Wittler, R., & Stoye, J. (2015). Bloom filter Trie–A data structure for Pan-Genome storage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9289, pp. 217–230). Springer Verlag. https://doi.org/10.1007/978-3-662-48221-6_16
Mendeley helps you to discover research relevant for your work.