A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 1063 human sequence data in 2.7 hours

3Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: In de novo sequence assembly, a standard pre-processing step is k-mer counting, which computes the number of occurrences of every length-k sub-sequence in the sequencing reads. Sequencing errors can produce many k-mers that do not appear in the genome, leading to the need for an excessive amount of memory during counting. This issue is particularly serious when the genome to be assembled is large, the sequencing depth is high, or when the memory available is limited. Results: Here, we propose a fast near-exact k-mer counting method, CQF-deNoise, which has a module for dynamically removing noisy false k-mers. It automatically determines the suitable time and number of rounds of noise removal according to a user-specified wrong removal rate. We tested CQF-deNoise comprehensively using data generated from a diverse set of genomes with various data properties, and found that the memory consumed was almost constant regardless of the sequencing errors while the noise removal procedure had minimal effects on counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consistently performed the best in terms of memory usage, consuming 49-76% less memory than the second best method. When counting the k-mers from a human dataset with around 60 coverage, the peak memory usage of CQF-deNoise was only 10.9 GB (gigabytes) for k ¼ 28 and 21.5 GB for k ¼ 55. De novo assembly of 106x human sequencing data using CQF-deNoise for k-mer counting required only 2.7 h and 90 GB peak memory.

Cite

CITATION STYLE

APA

Shi, C. H., & Yip, K. Y. (2020). A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 1063 human sequence data in 2.7 hours. Bioinformatics, 36, I625–I633. https://doi.org/10.1093/bioinformatics/btaa890

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free