The human pathogen severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) is responsible for the major pandemic of the 21st century. We analyzed >4,700 SARS-CoV-2 genomes and associated meta-data retrieved from public repositories. SARS-CoV-2 sequences have a high sequence identity (>99.9%), which drops to >96% when compared to bat coronavirus. We built a mutation-annotated reference SARS-CoV-2 phylogeny with two main macro-haplogroups, A and B, both of Asian origin, and >160 sub-branches representing virus strains of variable geographical origins worldwide, revealing a uniform mutation occurrence along branches that could complicate the design of future vaccines. The root of SARS-CoV-2 genomes locates at the Chinese haplogroup B1, with a TMRCA dating to 12 November 2019 - thus matching epidemiological records. Sub-haplogroup A2a originates in China and represents the major non-Asian outbreak. Multiple founder effect episodes, most likely associated with super-spreader hosts, explain COVID-19 pandemic to a large extent.
Gómez-Carballa, A., Bello, X., Pardo-Seco, J., Martinón-Torres, F., & Salas, A. (2020, May 19). The impact of super-spreaders in COVID-19: Mapping genome variation worldwide. BioRxiv. bioRxiv. https://doi.org/10.1101/2020.05.19.097410