Machine learning for the identification of respiratory viral attachment machinery from sequences data

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

At the outset of an emergent viral respiratory pandemic, sequence data is among the first molecular information available. As viral attachment machinery is a key target for therapeutic and prophylactic interventions, rapid identification of viral “spike” proteins from sequence can significantly accelerate the development of medical countermeasures. For six families of respiratory viruses, covering the vast majority of airborne and droplet-transmitted diseases, host cell entry is mediated by the binding of viral surface glycoproteins that interact with a host cell receptor. In this report it is shown that sequence data for an unknown virus belonging to one of the six families above provides sufficient information to identify the protein(s) responsible for viral attachment. Random forest models that take as input a set of respiratory viral sequences can classify the protein as “spike” vs. non-spike based on predicted secondary structure elements alone (with 97.3% correctly classified) or in combination with N-glycosylation related features (with 97.0% correctly classified). Models were validated through 10-fold cross-validation, bootstrapping on a class-balanced set, and an out-of-sample extra-familial validation set. Surprisingly, we showed that secondary structural elements and N-glycosylation features were sufficient for model generation. The ability to rapidly identify viral attachment machinery directly from sequence data holds the potential to accelerate the design of medical countermeasures for future pandemics. Furthermore, this approach may be extendable for the identification of other potential viral targets and for viral sequence annotation in general in the future.

Cite

CITATION STYLE

APA

Walker, K. C., Shwarts, M., Demidikin, S., Chakravarty, A., & Joseph-McCarthy, D. (2023). Machine learning for the identification of respiratory viral attachment machinery from sequences data. PLoS ONE, 18(3 March). https://doi.org/10.1371/journal.pone.0281642

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free