Computational Methods in Protein Evolution

N/ACitations
Citations of this article
127Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Proteins are the most versatile kind of molecule that we know and the result of a long evolutionary process. During this process, countless rearranging, mutating, and replicating strands of DNA have managed to both encode and conserve proteins that would allow them to replicate and stay intact and on the other hand have allowed their proteins to change and ultimately help them replicate more than other strands of DNA. All cells make proteins in their protein factories called ribosomes, where the DNA of a gene is translated according to the ancient genetic code into strings of amino acids which follow the laws of thermodynam-ics and molecular forces to fold up into specific wobbly three-dimensional shapes. Protein evolution happens whenever an accidental “typo”—or mutation—in the gene is translated into a modified protein, and that protein is released into the busy commotion within the cell, packed within a dense soup of other molecules in water. Whatever this new protein does differently than its predecessor can determine the fate of that mutation, making it either an essential innovation, a terrible mistake that gets erased, or something that just stays around for a while without being noticed, maybe to play a role in the distant future. This book is a compilation of methods that can be applied to various problems related to protein sequence and structure. It is a diverse collection of approaches ranging from broad conceptual (“protein space”) to very specific applications (“antibody modeling”). The term “evolution” is used slightly differently in various fields of science. While evolutionary biologists think about the natural process of Darwinian evolution (and other post-Darwinian forms of evolution of organisms living in populations and environments), bio-chemists take a more design-oriented approach to evolution, using the evolutionary process in vitro or in silico to make proteins with certain desired properties. Physicists on the other hand use the term evolution to describe a continuous process in time that changes a system from one to another state. While physics plays a significant role in this book, it is the first two notions of evolution that will be described in the following chapters. Evolutionary research has made extensive use of computers. While the result of evolu-tion can be readily studied at the macroscopic, phenotypic level, evolutionary biology has always had a strong theoretical component, since the actual process had been rare to directly observe for a long time. The underlying patterns of inheritance and the interplay between geography and population dynamics have been described in mathematical terms and have always accompanied the progress made in the Molecular Biology of cells that eventually elucidated the core mechanisms of inheritance: the information stored in DNA and how it is replicated and passed on—imperfectly—to future generations. The field of Bioinformatics was born as soon as the first sequences of genes and proteins had been published at a large enough quantity to be amenable to direct sequence-to-sequence comparisons. The fields of Molecular Evolution and Phylogenetics were close companions of this development where mathematical models and computational algorithms were combined to reconstruct the most likely evolutionary history given the observed DNA sequences. Protein sequences have been a free giveaway due to the ready translatability of the amino acid sequence from DNA based on the almost universal genetic code. DNA sequences became the main source material of molecular evolution research for quite a while, further spurred by the Human Genome Project and later the advent of the next-generation sequencing data explosion. Evolutionary relationships within populations and among species were revealed in ever greater detail. tter how much genetic sequence data has become available, there still have been many aspects of how genetics translates to observable (phenotypic) changes that cannot be understood at that level of description. Network science is another toolkit rooted in math and computation that is used to study evolution at the genotypic to phenotypic interface. There are networks representing physical and chemical molecular interactions within a cell, the flow of information and cell-level “computation” and communication, as well as more abstract networks describing the relationships and similarities between gene and protein sequences, including the entire “universe” of known proteins. While biological network science—often called systems biology—comes close to providing a working model of the cellular phenotype, the real “gap” in understanding where a mutation in the DNA sequence makes a difference to the survival and fitness of an entire organism is how physical interactions, the “edges” or connections in systems biology networks, are a result of biophysical properties of proteins, which can be altered by mutations. It is this point— where changes of DNA translate into altered protein structure and function—that most of the methods in this book are focused on. While Molecular Evolution has been a backward-facing, almost historical, discipline in its early days, it has increasingly matured into an “applicable” science due to its intersections with Biochemistry and Biophysics. Protein evolution is therefore much more than just the description of evolutionary relationships based on sequence differences. It has become a powerful tool for interfering with the evolution of pathogens, for devising therapies against mutation-based diseases such as cancers, and for designing novel enzymes with properties that can go beyond naturally evolved functions. Methods from evolution can be easily applied whenever genetic variation is at play, and this variation is what makes all humans unique and sometimes even determines why diseases and infections affect each of us differently. While each chapter in this book is the unique work of its authors and there is no predefined “narrative” to this book, some common themes become apparent. The first theme is that of mutations of single amino acids, i.e. point mutations. Predict-ing their effect on the physical structure of a protein is an important capability that links the abundance of sequence information with the comparatively few known structures (Chapters 1and2). Other mutational mechanisms lead to gene duplication (Chapter3 ) and even de novo emergence of new genes (Chapter4 ). Likewise, the understanding of pairwise correlated mutations can be used to reveal structure information where none is available because the fates of spatially close (and physically interacting) amino acids are evolutionarily linked and coevolve (Chapters5, 6 and7 ). Going back into evolutionary history, the structure and function of proteins can be reconstructed and used productively, since these may bear similar functions to their extant descendants yet also may have some new functional properties (Chapters8and9 ). Many formerly sequence-based methods such as sequence alignments and phylogenies can be improved by applying a more structural and biophysical viewpoint (Chapters10and11 ). Instead of exploring similar proteins along evolutionary time, one can of course also compare existing proteins based on their similarity in sequence and structure. A number of classification schemes for organizing all known proteins exist, and it is possible to explore an entire “protein universe,” often by breaking full proteins into even smaller building units called domains (Chapters12,13,14,15and16 ). Homology modeling makes use of these similarities by fitting the sequences of proteins without known structure to those known structures of proteins with similar sequence (Chapter17 ). This structure prediction can also be extended to protein-protein interactions (Chapter18 ) and even some structural proper-ties of proteins lacking a fixed structure, i.e., disordered/unstructured proteins can be predicted (Chapter19 ). Another important aspect related to disorder is the intrinsic dynamic nature of folded proteins that always exist as an ensemble of conformations, some of which become favored or disfavored with evolutionary changes (Chapter20 ). Finally, evolutionary principles are at work in shaping such versatile proteins as anti-bodies or enzymes, which can also be designed to have certain properties in silico by applying directed evolution, i.e., where the evolutionary endpoint, but not its path, is determined by the researcher (Chapters21and22 ). The book covers a wide range of computational approaches, including the dynamic programming techniques of sequence alignments, the clustering methods of phylogenies, physics-based approaches such as molecular dynamics simulations, and a range of statistical, graph-based, and machine learning methods. While the authors take the time to give some background and references in the introductory sections, this book is not a textbook, and more detailed descriptions of underlying theory and algorithms may have to be found elsewhere. Nevertheless, I think that there is a lot to be learned from this book for an interdisciplinary readership. I sincerely hope that this book offers many useful workflows and techniques that help many researchers and students working with proteins computationally. I also strongly encourage the reader to go beyond the individual protocol and mix and match the different methods to come up with new innovative solutions. That’s what evolution would do. Heidelberg, Germany Tobias Sikosek

Cite

CITATION STYLE

APA

Sikosek, T. (Ed.). (2019). Computational Methods in Protein Evolution (Vol. 1851). Springer New York. https://doi.org/10.1007/978-1-4939-8736-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free