Abstract
To understand diversity in enormous collections of genome sequences, we need computationally scalable tools that can quickly contextualize individual genomes based on their similarities and identify features of each genome that make them unique. We present WhatsGNU, a tool based on exact match proteomic compression that, in seconds, classifies any new genome and provides a detailed report of protein alleles that may have novel functional differences. We use this technique to characterize the total allelic diversity (panallelome) of Salmonella enterica, Mycobacterium tuberculosis, Pseudomonas aeruginosa, and Staphylococcus aureus. It could be extended to others. WhatsGNU is available from https://github.com/ahmedmagds/WhatsGNU.
Author supplied keywords
Cite
CITATION STYLE
Moustafa, A. M., & Planet, P. J. (2020). WhatsGNU: A tool for identifying proteomic novelty. Genome Biology, 21(1). https://doi.org/10.1186/s13059-020-01965-w
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.