Bloom filter Trie–A data structure for Pan-Genome storage

19Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High throughput sequencing technologies have become fast and cheap in the past years. As a result, large-scale projects started to sequence tens to several thousands of genomes per species, producing a high number of sequences sampled from each genome. Such a highly redundant collection of very similar sequences is called a pan-genome. It can be transformed into a set of sequences “colored” by the genomes to which they belong. A colored de-Bruijn graph (C-DBG) extracts from the sequences all colored k-mers, strings of length k, and stores them in vertices. In this paper, we present an alignment-free, reference-free and incremental data structure for storing a pan-genome as a C-DBG: the Bloom Filter Trie. The data structure allows to store and compress a set of colored k-mers, and also to efficiently traverse the graph. Experimental results prove better performance compared to another state-of-the-art data structure.

Cite

CITATION STYLE

APA

Holley, G., Wittler, R., & Stoye, J. (2015). Bloom filter Trie–A data structure for Pan-Genome storage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9289, pp. 217–230). Springer Verlag. https://doi.org/10.1007/978-3-662-48221-6_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free