Improving the specificity of exon prediction using comparative genomics

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Background: Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions. Methods: A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions. Results: The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity ≤ 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/. Conclusion: The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions. © 2008 Wu; licensee BioMed Central Ltd.

References Powered by Scopus

Initial sequencing and comparative analysis of the mouse genome

5728Citations
N/AReaders
Get full text

Prediction of complete gene structures in human genomic DNA

3343Citations
N/AReaders
Get full text

NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins

1482Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research

14Citations
N/AReaders
Get full text

A comprehensive review of emerging computational methods for gene identification

9Citations
N/AReaders
Get full text

Promoting inter/multidisciplinary education and research in bioinformatics, systems biology and intelligent computing

7Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wu, J. (2008). Improving the specificity of exon prediction using comparative genomics. In BMC Genomics (Vol. 9). https://doi.org/10.1186/1471-2164-9-S2-S13

Readers' Seniority

Tooltip

Researcher 5

42%

Professor / Associate Prof. 4

33%

PhD / Post grad / Masters / Doc 3

25%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 7

58%

Biochemistry, Genetics and Molecular Bi... 4

33%

Engineering 1

8%

Save time finding and organizing research with Mendeley

Sign up for free