The M5nr: A novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Andreas Wilke; Travis Harrison; Jared Wilkening; Dawn Field; Elizabeth M. Glass; Nikos Kyrpides; Konstantinos Mavrommatis; Folker Meyer

Journal ArticleOPEN ACCESS

The M5nr: A novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

BMC Bioinformatics (2012) 13(1)

DOI: 10.1186/1471-2105-13-141

272Citations

305Readers

Abstract

Background: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.Description: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.Conclusions: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets. © 2012 Wilke et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Wilke, A., Harrison, T., Wilkening, J., Field, D., Glass, E. M., Kyrpides, N., … Meyer, F. (2012). The M5nr: A novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC Bioinformatics, 13(1). https://doi.org/10.1186/1471-2105-13-141

The M5nr: A novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Abstract

Cite

Register to see more suggestions