KCMBT: A k-mer Counter based on Multiple Burst Trees

23Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Motivation: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications. Results: We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.

Cite

CITATION STYLE

APA

Mamun, A. A., Pal, S., & Rajasekaran, S. (2016). KCMBT: A k-mer Counter based on Multiple Burst Trees. Bioinformatics, 32(18), 2783–2790. https://doi.org/10.1093/bioinformatics/btw345

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free