Analysis of E.coli promoter recognition problem in dinucleotide feature space

28Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Patterns in the promoter sequences within a species are known to be conserved but there exist many exceptions to this rule which makes the promoter recognition a complex problem. Although many complex feature extraction schemes coupled with several classifiers have been proposed for promoter recognition in the current literature, the problem is still open. Results: A dinucleotide global feature extraction methodis proposed for the recognition of sigma-70 promoters in Escherichia coli in this article. The positive data set consists of sigma-70 promoters with known transcription starting points which are part of regulonDB and promec databases. Four different kinds of negative data sets are considered, two of them biological sets (Gordon et al., 2003) and the other two synthetic data sets. Our results reveal that a single-layer perceptron using dinucleotide features is able to achieve an accuracy of 80% against a background of biological non-promoters and 96% for random data sets. A scheme for locating the promoter regions in a given genome sequence is proposed. A deeper analysis of the data set shows that there is a bifurcation of the data set into two distinct classes, a majority class and a minority class. Our results point out that majority class constituting the majority promoter and the majority non-promoter signal is linearly separable. Also the minority class is linearly separable. We further show that the feature extraction and classification methods proposed in the paper are generic enough to be applied to the more complex problem of eucaryotic promoter recognition. We present Drosophila promoter recognition as a case study. © 2007 Oxford University Press.

Cite

CITATION STYLE

APA

Rani, T. S., Bhavani, S. D., & Bapi, R. S. (2007). Analysis of E.coli promoter recognition problem in dinucleotide feature space. Bioinformatics, 23(5), 582–588. https://doi.org/10.1093/bioinformatics/btl670

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free