Abstract
Genomic sequence processing has been an active area of research for the past two decades and has increasingly attracted the attention of digital signal processing researchers in recent years. A challenging open problem in deoxyribonucleic acid (DNA) sequence analysis is maximizing the prediction accuracy of eukaryotic gene locations and thereby protein coding regions. In this paper, DNA symbolic-to-numeric representations are presented and compared with existing techniques in terms of relative accuracy for the gene and exon prediction problem. Novel signal processing-based gene and exon prediction methods are then evaluated together with existing approaches at a nucleotide level using the Burset/ Guigo1996, HMR195, and GENSCAN standard genomic datasets. A new technique for the recognition of acceptor splice sites is then proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method. By comparison with the acceptor splice site detection method used in the gene-finding program GENSCAN, the proposed DSP-statistical hybrid technique reveals a consistent reduction in false positives at different levels of sensitivity, averaging a 43% reduction when evaluated on the GENSCAN test set. © 2008 IEEE.
Author supplied keywords
Cite
CITATION STYLE
Akhtar, M., Epps, J., & Ambikairajah, E. (2008). Signal processing in sequence analysis: Advances in eukaryotic gene prediction. IEEE Journal on Selected Topics in Signal Processing, 2(3), 310–321. https://doi.org/10.1109/JSTSP.2008.923854
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.