Motivation: We developed GeneGenerator because of the need for a tool to predict gene structure without knowing in advance how to score potential exons and introns in order to obtain the best results, pertinent in particular to less well-studied organisms for which suitable training sets are small. GeneGenerator is a very flexible algorithm which for a given genomic sequence generates a number of feasible gene structures satisfying user-defined constraints. The specific implementation described in detail requires minimum scoring for translation start and donor and acceptor splice sites according to previously trained logitlinear models. In addition, potential exons and introns are required to exceed specified minimal lengths and threshold scores for coding or non-coding potential derived as log-likelihood ratios of appropriate Markov sequence models. Results: A database of 46 non-redundant genomic sequences from maize is used for illustration. It is shown that the correct gene structures do not always maximize the considered target function. However, in most cases, the correct or nearly correct structures are found in a small set of high-scoring structures. A critical review of the generated structures sometimes allows the choices to be narrowed by considering additional variables such as predicted splice site strength or local optimality of splice site scores. Summary statistics for prediction accuracy over all 46 maize genes are derived under cross-validation and non-cross-validation training conditions for the Markov sequence models. The algorithm achieved exon sensitivity of 0.81 and specificity of 0.75 on an independent set of 14 novel maize genomic segments. Availability: GeneGenerator runs under Borland-Pascal 7.0 using MS-DOS and C on UNIX work stations. The source code is available upon request. Contact: jkleffe@@@euler.grumed.fu-berlin-de.
CITATION STYLE
Kleffe, J., Hermann, K., Vahrson, W., Wittig, B., & Brendel, V. (1998). GeneGenerator - A flexible algorithm for gene prediction and its application to maize sequences. In Bioinformatics (Vol. 14, pp. 232–243). Oxford University Press. https://doi.org/10.1093/bioinformatics/14.3.232
Mendeley helps you to discover research relevant for your work.