UniProt: The universal protein knowledgebase

Alex Bateman; Maria Jesus Martin; Claire O'Donovan; Michele Magrane; Emanuele Alpi; Ricardo Antunes; Benoit Bely; Mark Bingley; Carlos Bonilla; Ramona Britto; Borisas Bursteinas; Hema Bye-AJee; Andrew Cowley; Alan Da Silva; Maurizio De Giorgi; Tunca Dogan; Francesco Fazzini; Leyla Garcia Castro; Luis Figueira; Penelope Garmiri; George Georghiou; Daniel Gonzalez; Emma Hatton-Ellis; Weizhong Li; Wudong Liu; Rodrigo Lopez; Jie Luo; Yvonne Lussi; Alistair MacDougall; Andrew Nightingale; Barbara Palka; Klemens Pichler; Diego Poggioli; Sangya Pundir; Luis Pureza; Guoying Qi; Steven Rosanoff; Rabie Saidi; Tony Sawford; Aleksandra Shypitsyna; Elena Speretta; Edward Turner; Nidhi Tyagi; Vladimir Volynkin; Tony Wardell; Kate Warner; Xavier Watkins; Rossana Zaru; Hermann Zellner; Ioannis Xenarios; Lydie Bougueleret; Alan Bridge; Sylvain Poux; Nicole Redaschi; Lucila Aimo; Ghislaine ArgoudPuy; Andrea Auchincloss; Kristian Axelsen; Parit Bansal; Delphine Baratin; Marie Claude Blatter; Brigitte Boeckmann; Jerven Bolleman; Emmanuel Boutet; Lionel Breuza; Cristina Casal-Casas; Edouard De Castro; Elisabeth Coudert; Beatrice Cuche; Mikael Doche; Dolnide Dornevil; Severine Duvaud; Anne Estreicher; Livia Famiglietti; Marc Feuermann; Elisabeth Gasteiger; Sebastien Gehant; Vivienne Gerritsen; Arnaud Gos; Nadine Gruaz-Gumowski; Ursula Hinz; Chantal Hulo; Florence Jungo; Guillaume Keller; Vicente Lara; Philippe Lemercier; Damien Lieberherr; Thierry Lombardot; Xavier Martin; Patrick Masson; Anne Morgat; Teresa Neto; Nevila Nouspikel; Salvo Paesano; Ivo Pedruzzi; Sandrine Pilbout; Monica Pozzato; Manuela Pruess; Catherine Rivoire; Bernd Roechert; Michel Schneider; Christian Sigrist; Karin Sonesson; Sylvie Staehli; Andre Stutz; Shyamala Sundaram; Michael Tognolli; Laure Verbregue; Anne Lise Veuthey; Cathy H. Wu; Cecilia N. Arighi; Leslie Arminski; Chuming Chen; Yongxing Chen; John S. Garavelli; Hongzhan Huang; Kati Laiho; Peter McGarvey; Darren A. Natale; Karen Ross; C. R. Vinayaka; Qinghua Wang; Yuqi Wang; Lai Su Yeh; Jian Zhang

Journal ArticleOPEN ACCESS

UniProt: The universal protein knowledgebase

Nucleic Acids Research (2017) 45(D1) D158-D169

DOI: 10.1093/nar/gkw1099

3.5kCitations

3.6kReaders

Abstract

The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.

Cite

CITATION STYLE

APA

Bateman, A., Martin, M. J., O’Donovan, C., Magrane, M., Alpi, E., Antunes, R., … Zhang, J. (2017). UniProt: The universal protein knowledgebase. Nucleic Acids Research, 45(D1), D158–D169. https://doi.org/10.1093/nar/gkw1099

UniProt: The universal protein knowledgebase

Abstract

Cite

Register to see more suggestions