Motivation: Alternative splicing (AS) serves as a mechanism to create diversity among functional proteins. Increasing evidence indicates that a large portion of genes have AS forms. Hence AS variants should be considered while analyzing gene structures. Results: A new cross-species gene identification and AS analysis system, PSEP, has been developed. The system is based on expressed sequence tag (EST)-to-genome and genome-to-genome comparisons and is implemented in two steps: sequence alignment and a series of post-alignment processes, including progressive signal extraction and patching. For gene identification, these post-alignment processes serve as noise filters and enable PSEP to eliminate ∼88% of potential overprediction. The overall accuracy of PSEP is better than or comparable to that of other well-known cross-species gene prediction programs, including the ROSETTA program, TWINSCAN, SGP-1/-2 and SLAM, when tested on three benchmark datasets (the ELN gene region, the HoxA cluster and the ROSETTA set). In addition, 76.2 and 76.0% of multiple-exon genes in the ROSETTA dataset and human chromosome 20, respectively, are found to have AS forms. Approximately 23% of the 210 elementary alternatives identified in the ROSETTA dataset are not conserved between the human and mouse genomes, and none of the 210 transcripts is found in the RefSeq annotation. With its dual functions in cross-species conserved sequence analysis and AS analysis, PSEP is highly suitable for studying the evolution of AS patterns and for finding unidentified gene expression features. © Oxford University Press 2004; all rights reserved.
CITATION STYLE
Chuang, T. J., Che, F. C., & Chou, M. Y. (2004). A comparative method for identification of gene structures and alternatively spliced variants. Bioinformatics, 20(17), 3064–3079. https://doi.org/10.1093/bioinformatics/bth368
Mendeley helps you to discover research relevant for your work.