Matchtigs: minimum plain text representation of k-mer sets

2Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.

Cite

CITATION STYLE

APA

Schmidt, S., Khan, S., Alanko, J. N., Pibiri, G. E., & Tomescu, A. I. (2023). Matchtigs: minimum plain text representation of k-mer sets. Genome Biology, 24(1). https://doi.org/10.1186/s13059-023-02968-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free