DDIG-in: Detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels

52Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. Results: We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques.

Cite

CITATION STYLE

APA

Folkman, L., Yang, Y., Li, Z., Stantic, B., Sattar, A., Mort, M., … Zhou, Y. (2015). DDIG-in: Detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics, 31(10), 1599–1606. https://doi.org/10.1093/bioinformatics/btu862

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free