A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

Taner Z. Sen; Haitao Cheng; Andrzej Kloczkowski; Robert L. Jernigan

Journal ArticleOPEN ACCESS

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

Sen T
Cheng H
Kloczkowski A
et al.

Protein Science (2006) 15(11) 2499-2506

DOI: 10.1110/ps.062125306

18Citations

21Readers

Abstract

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross‐validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q 3 ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.

Cite

CITATION STYLE

APA

Sen, T. Z., Cheng, H., Kloczkowski, A., & Jernigan, R. L. (2006). A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Science, 15(11), 2499–2506. https://doi.org/10.1110/ps.062125306

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

Abstract

Cite

Register to see more suggestions