Protein sequence evolution is a complex process that varies across the tree of life and among-sites within proteins. Comparing evolutionary rate matrices for specific taxa ('clade-specific models') can reveal this variation and provide information about the basis for changes in the paterns of protein evolution over time. However, clade-specific models can only provide this information if the variation among taxa exceeds the variation among proteins. We showed this to be the case by demonstrating that clade-specific model fit could distinguish among proteins from the four taxa that we examined (vertebrates, plants, oomycetes, and yeasts). Model fit classified proteins correctly by clade of origin >70% of the time. A relatively small number of dimensions can explain differences among models. If model parameters are averaged across all sites ∼80% of the variance among models reflects clade; for models that consider protein structure ∼50% of the variance reflected relative solvent accessibility and ∼25% reflected clade. Relaxed purifying selection in taxa with smaller long-Term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with the degree of change in amino acid side-chain volume for substitutions; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference. Availability: model files are available from htps://github.com/ebraun68/clade_specific_prot_models.
CITATION STYLE
Pandey, A., & Braun, E. L. (2020). Protein evolution is structure dependent and non-homogeneous across the tree of life. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3388440.3412473
Mendeley helps you to discover research relevant for your work.