How big is that genome? Estimating genome size and coverage from k-mer abundance spectra

Michal Hozza; Tomáš Vinař; Broňa Brejová

Conference Proceedings

How big is that genome? Estimating genome size and coverage from k-mer abundance spectra

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9309 199-209

DOI: 10.1007/978-3-319-23826-5_20

17Citations

32Readers

Get full text

Abstract

Many practical algorithms for sequence alignment, genome assembly and other tasks represent a sequence as a set of k-mers. Here, we address the problems of estimating genome size and sequencing coverage from sequencing reads, without the need for sequence assembly. Our estimates are based on a histogram of k-mer abundance in the input set of sequencing reads and on probabilistic modeling of distribution of k-mer abundance based on parameters related to the coverage, error rate and repeat structure of the genome. Our method provides reliable estimates even at coverage as low as 0.5 or at error rates as high as 10%.

Cite

CITATION STYLE

APA

Hozza, M., Vinař, T., & Brejová, B. (2015). How big is that genome? Estimating genome size and coverage from k-mer abundance spectra. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9309, pp. 199–209). Springer Verlag. https://doi.org/10.1007/978-3-319-23826-5_20

How big is that genome? Estimating genome size and coverage from k-mer abundance spectra

Abstract

Cite

Register to see more suggestions