Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my!

30Citations
Citations of this article
67Readers
Mendeley users who have this article in their library.

Abstract

Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions. © 2010 Brown, Brown.

References Powered by Scopus

Elements of Information Theory

36608Citations
N/AReaders
Get full text

MUSCLE: Multiple sequence alignment with high accuracy and high throughput

35702Citations
N/AReaders
Get full text

MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform

12115Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Emerging methods in protein co-evolution

472Citations
N/AReaders
Get full text

Emerging Computational Methods for the Rational Discovery of Allosteric Drugs

188Citations
N/AReaders
Get full text

Reconstruction of Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying Evolutionary Innovation through Gene Duplication

156Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Brown, C. A., & Brown, K. S. (2010). Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my! PLoS ONE, 5(6). https://doi.org/10.1371/journal.pone.0010779

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 29

52%

Researcher 20

36%

Professor / Associate Prof. 7

13%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 33

63%

Biochemistry, Genetics and Molecular Bi... 13

25%

Computer Science 3

6%

Engineering 3

6%

Save time finding and organizing research with Mendeley

Sign up for free