Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses.
CITATION STYLE
Goussarov, G., Goussarov, G., Cleenwerck, I., Mysara, M., Leys, N., Monsieurs, P., … Van Houdt, R. (2020). PaSiT: A novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics, 36(8), 2337–2344. https://doi.org/10.1093/bioinformatics/btz964
Mendeley helps you to discover research relevant for your work.