As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.
Baeten, L., Serrano, L., Lenaerts, T., Reumers, J., Tur, V., Schymkowitz, J., … Stricher, F. (2008). Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments. PLoS Computational Biology, 4(5), e1000083. https://doi.org/10.1371/journal.pcbi.1000083