Biological machine learning combined with campylobacter population genomics reveals virulence gene allelic variants cause disease

16Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed µPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms.

Cite

CITATION STYLE

APA

Bandoy, D. D. R., & Weimer, B. C. (2020). Biological machine learning combined with campylobacter population genomics reveals virulence gene allelic variants cause disease. Microorganisms, 8(4). https://doi.org/10.3390/microorganisms8040549

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free