Testing statistical significance scores of sequence comparison methods with structure similarity

10Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons. © 2006 Hulsen et al; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Hulsen, T., de Vlieg, J., Leunissen, J. A. M., & Groenen, P. M. A. (2006). Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics, 7. https://doi.org/10.1186/1471-2105-7-444

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free