Pseudo-periodic partitions of biological sequences

Lugang Li; Renchao Jin; Poh Lin Kok; Honghui Wan

Journal ArticleOPEN ACCESS

Pseudo-periodic partitions of biological sequences

Bioinformatics (2004) 20(3) 295-306

DOI: 10.1093/bioinformatics/btg404

10Citations

20Readers

Abstract

Motivation: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudoperiodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution. Results: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith-Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%. © Oxford University Press 2004; All rights reserved.

Cite

CITATION STYLE

APA

Li, L., Jin, R., Kok, P. L., & Wan, H. (2004). Pseudo-periodic partitions of biological sequences. Bioinformatics, 20(3), 295–306. https://doi.org/10.1093/bioinformatics/btg404

Pseudo-periodic partitions of biological sequences

Abstract

Cite

Register to see more suggestions