Automatically extracting functionally equivalent proteins from SwissProt

13Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot. © 2008 McMillan and Martin; licensee BioMed Central Ltd.

References Powered by Scopus

Basic local alignment search tool

78929Citations
N/AReaders
Get full text

The COG database: An updated vesion includes eukaryotes

3685Citations
N/AReaders
Get full text

The COG database: New developments in phylogenetic classification of proteins from complete genomes

1629Citations
N/AReaders
Get full text

Cited by Powered by Scopus

The Tarenaya hassleriana genome provides insight into reproductive trait and genome evolution of crucifers

85Citations
N/AReaders
Get full text

IntPred: A structure-based predictor of protein-protein interaction sites

60Citations
N/AReaders
Get full text

Molecular Mechanism of Spontaneous Nucleosome Unraveling

57Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

McMillan, L. E. M., & Martin, A. C. R. (2008). Automatically extracting functionally equivalent proteins from SwissProt. BMC Bioinformatics, 9. https://doi.org/10.1186/1471-2105-9-418

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 13

42%

Researcher 13

42%

Professor / Associate Prof. 5

16%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 20

63%

Biochemistry, Genetics and Molecular Bi... 5

16%

Computer Science 4

13%

Medicine and Dentistry 3

9%

Save time finding and organizing research with Mendeley

Sign up for free