The recent drop in genome sequencing costs has created a promising horizon for the development of genomic medicine. Within the biomedical environment, sequencing data are increasingly used for disease diagnosis and prognosis, treatment development, counseling, and so on. Many of these applications rely on the identification of disease causing variants. This is a particularly challenging problem because of the large number and wide variety of sequence variants identified in sequencing projects, and also because we only have a limited understanding of the physicochemical/biochemical properties that differentiate neutral from pathologic variants. Nonetheless, these last years have witnessed important methodological advances for one class of variants, those corresponding to changes in the amino-acid sequence of proteins. Proteins are a main constituent of living systems. We know that although their biological properties are essentially determined by the amino-acid sequence, not all the changes in this sequence have the same impact. Some are neutral, but others affect protein function and lead to disease. A large body of evidence shows that whether one or the other is the case that depends on properties such as mutation location in the protein structure, interspecies conservation, and so on. Mutation prediction methods based on these features have good success rates, in the 70–90% range, although representation over time suggests there is a performance plateau that would limit their applicability. In light of the most recent advances in the field, and after reviewing the foundations of prediction methods, we discuss the existence of this performance threshold and how it can be overcomed.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below