The annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
CITATION STYLE
Kamath, U., Shehu, A., & De Jong, K. A. (2012). Feature and kernel evolution for recognition of hypersensitive sites in DNA sequences. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (Vol. 87 LNICST, pp. 213–228). https://doi.org/10.1007/978-3-642-32615-8_23
Mendeley helps you to discover research relevant for your work.