A Daily-Updated Database and Tools for Comprehensive SARSCoV-2 Mutation-Annotated Trees

53Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/and https://github.com/yatisht/usher, respectively.

Cite

CITATION STYLE

APA

McBroome, J., Thornlow, B., Hinrichs, A. S., Kramer, A., De Maio, N., Goldman, N., … Turakhia, Y. (2021). A Daily-Updated Database and Tools for Comprehensive SARSCoV-2 Mutation-Annotated Trees. Molecular Biology and Evolution, 38(12), 5819–5824. https://doi.org/10.1093/molbev/msab264

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free