Sequence comparison remains a powerful tool to assess the structural relatedness of two proteins. To develop a sensitive sequence-based procedure for fold recognition, we performed an exhaustive global alignment (with zero end gap penalties) between sequences of protein domains with known three-dimensional folds. The subset of 1.3 million alignments between sequences of structurally unrelated domains was used to derive a set of analytical functions that represent the probability of structural significance for any sequence alignment at a given sequence identity, sequence similarity and alignment score. Analysis of overlap between structurally significant and insignificant alignments shows that sequence identity and sequence similarity measures are poor indicators of structural relatedness in the 'twilight zone', while the alignment score allows much better discrimination between alignments of structurally related and unrelated sequences for a wide variety of alignment settings. A fold recognition benchmark was used to compare eight different substitution matrices with eight sets of gap penalties. The best performing matrices were Gonnet and Blosum50 with normalized gap penalties of 2.4/0.15 and 2.0/0.15, respectively, while the positive matrices were the worst performers. The derived functions and parameters can be used for fold recognition via a multilink chain of probability weighted pairwise sequence alignments.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Abagyan, R. A., & Batalov, S. (1997). Do aligned sequences share the same fold? Journal of Molecular Biology, 273(1), 355–368. https://doi.org/10.1006/jmbi.1997.1287