Abstract
Background: Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing "nearby" species in a reference data set, or for constructing phylogenetic trees. Results: We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets. Conclusions: The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.
Author supplied keywords
Cite
CITATION STYLE
Lichtblau, D. (2019). Alignment-free genomic sequence comparison using FCGR and signal processing. BMC Bioinformatics, 20(1). https://doi.org/10.1186/s12859-019-3330-3
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.