A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

  • Sen T
  • Cheng H
  • Kloczkowski A
  • et al.
18Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross‐validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q 3 ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.

Cite

CITATION STYLE

APA

Sen, T. Z., Cheng, H., Kloczkowski, A., & Jernigan, R. L. (2006). A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Science, 15(11), 2499–2506. https://doi.org/10.1110/ps.062125306

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free