Data mining techniques can be well applied using various algorithms for the prediction of E.coli promoter regions. We studied various classification, Association and clustering algorithms like CART, Simple Logistic, BayesNet, Random forest, j48, LMT, Naïve Bayesian, Apriori and simpleKMeans over different E.coli promoter dataset. Random forest method using training dataset outperforms the remaining classification methods. The Association model (Apriori) predicted the presence of Adenine (A) at -45, -10 and -11 regions, Thiamine (T) at -35, -36 regions, Guanine (G) at -34 region. Cytosine (C) is not present in the submitted DNA data for E.coli promoter dataset at -14 to -9 and-36 to -31 regions using association model. Cluster based model using simpleKMeans predicted promoter regions true at -35 and -10 regions. If -36 to -31 region of the sequence contain TTGACA and -14 to -9 region contains TATAAT, there can be highest probability of finding promoter in E.coli. The condition becomes false, if the -36 to -31 region contains ACGACG and -14 to -9 contain TGAATG. © 2012 Springer-Verlag GmbH Berlin Heidelberg.
CITATION STYLE
Kaladhar, D. S. V. G. K., Uma Devi, T., Lakshmi, P. V., Harikrishna Reddy, R., Sriteja Ayayangar V., R. K., & Nageswara Rao, P. V. (2012). Analysis of E.coli promoter regions using classification, association and clustering algorithms. In Advances in Intelligent and Soft Computing (Vol. 132 AISC, pp. 169–177). Springer Verlag. https://doi.org/10.1007/978-3-642-27443-5_20
Mendeley helps you to discover research relevant for your work.