PfaSTer: a machine learning-powered serotype caller for Streptococcus pneumoniae genomes

6Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Streptococcus pneumoniae (pneumococcus) is a leading cause of morbidity and mortality worldwide. Although multi-valent pneumococcal vaccines have curbed the incidence of disease, their introduction has resulted in shifted serotype distributions that must be monitored. Whole genome sequence (WGS) data provide a powerful surveillance tool for tracking isolate serotypes, which can be determined from nucleotide sequence of the capsular polysaccharide biosynthetic operon (cps). Although software exists to predict serotypes from WGS data, most are constrained by requiring high-coverage next-generation sequencing reads. This can present a challenge in respect of accessibility and data sharing. Here we present PfaSTer, a machine learning-based method to identify 65 prevalent serotypes from assembled S. pneumoniae genome sequences. PfaSTer combines dimensional-ity reduction from k-mer analysis with a Random Forest classifier for rapid serotype prediction. By leveraging the model’s built-in statistical framework, PfaSTer determines confidence in its predictions without the need for coverage-based assessments. We then demonstrate the robustness of this method, returning >97 % concordance when compared to biochemical results and other in silico serotyping tools. PfaSTer is open source and available at: https://github.com/pfizer-opensource/pfaster.

Cite

CITATION STYLE

APA

Lee, J. T., Li, X., Hyde, C., Liberator, P. A., & Hao, L. (2023). PfaSTer: a machine learning-powered serotype caller for Streptococcus pneumoniae genomes. Microbial Genomics, 9(6). https://doi.org/10.1099/mgen.0.001033

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free