At present many digital signal processing based techniques are available to predict genomic protein coding regions. However, accurate identification of these regions at the level of individual nucleotides remains a challenge. In this paper, we propose the novel use of a multi-dimensional feature and Gaussian mixture models for the classification between protein coding and non-coding nucleotides. Employing signal processing based time-domain and frequency-domain features, the novel system described herein is shown to produce identification accuracies of more than 75% and 79% respectively for protein coding and non-coding nucleotides, when evaluated on the GENSCAN data set.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below