Conserved molecular signatures in the spike protein provide evidence indicating the origin of SARS-CoV-2 and a pangolin-CoV (MP789) by recombination(s) between specific lineages of sarbecoviruses

7Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Both SARS-CoV-2 and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2, sequences for the spike and nucleocapsid proteins from sarbecoviruses were analyzed to identify molecular markers consisting of conserved inserts or deletions (termed CSIs) that are specific for either a particular clade of Sarbecovirus or are commonly shared by two or more clades of these viruses. Three novel CSIs in the N-terminal domain (NTD) of the spike protein S1-subunit (S1-NTD) are uniquely shared by SARS-CoV-2, Bat-CoV-RaTG13 and most pangolin CoVs (SARS-CoV-2r clade). Three other sarbecoviruses viz. bat-CoVZXC21, -CoVZC45 and -PrC31 (forming CoVZC/PrC31 clade), and a pangolin-CoV_MP789 also contain related CSIs in the same positions. In contrast to the S1-NTD, both SARS and SARS-CoV-2r viruses contain two large CSIs in the S1-C-terminal domain (S1-CTD) that are absent in the CoVZC/PrC31 clade. One of these CSIs, consisting of a 12 aa insert, is also present in the RShSTT clade (Cambodia-CoV strains). Sequence similarity studies show that the S1-NTD of SARS-CoV-2r viruses is most similar to the CoVZC/PrC31 clade, whereas their S1-CTD exhibits highest similarity to the RShSTT- (and the SARS-related) CoVs. Results from the shared presence of CSIs and sequence similarity studies on different CoV lineages support the inference that the SARS-CoV-2r cluster of viruses has originated by a genetic recombination between the S1-NTD of the CoVZC/PrC31 clade of CoVs and the S1-CTD of RShSTT/SARS viruses, respectively. We also present compelling evidence, based on the shared presence of CSIs and sequence similarity studies, that the pangolin-CoV_MP789, whose receptor-binding domain is most similar to the SARS-CoV-2 virus, has resulted from another independent recombination event involving the S1-NTD of the CoVZC/PrC31 CoVs and the S1-CTD of an unidentified SARS-CoV-2r related virus. The SARS-CoV-2 virus involved in this latter recombination event is postulated to be most similar to the SARS-CoV-2. Several other CSIs reported here are specific for other clusters of sarbecoviruses including a clade consisting of bat-SARS-CoVs (BM48-31/BGR/2008 and SARS_BtKY72). Structural mapping studies show that the identified CSIs form distinct loops/patches on the surface of the spike protein. It is hypothesized that these novel loops/patches on the spike protein, through their interactions with other host components, should play important roles in the biology/pathology of SARS-CoV-2 virus. Lastly, the CSIs specific for different clades of sarbecoviruses including SARS-CoV-2r clade provide novel means for the identification of these viruses and other potential applications.

Cite

CITATION STYLE

APA

Khadka, B., & Gupta, R. S. (2021). Conserved molecular signatures in the spike protein provide evidence indicating the origin of SARS-CoV-2 and a pangolin-CoV (MP789) by recombination(s) between specific lineages of sarbecoviruses. PeerJ, 9. https://doi.org/10.7717/peerj.12434

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free