At present many digital signal processing based techniques are available to predict genomic protein coding regions. However, accurate identification of these regions at the level of individual nucleotides remains a challenge. In this paper, we propose the novel use of a multi-dimensional feature and Gaussian mixture models for the classification between protein coding and non-coding nucleotides. Employing signal processing based time-domain and frequency-domain features, the novel system described herein is shown to produce identification accuracies of more than 75% and 79% respectively for protein coding and non-coding nucleotides, when evaluated on the GENSCAN data set.
CITATION STYLE
Akhtar, M., & Ambikairajah, E. and J. E. (2007). GMM-Based Classification of Genome Sequence. Digital Signal Processing, 103–106.
Mendeley helps you to discover research relevant for your work.