Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model

Metin Balaban; Nishat Anjum Bristy; Ahnaf Faisal; Md Shamsuzzoha Bayzid; Siavash Mirarab

Journal ArticleOPEN ACCESS

Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model

Bioinformatics Advances (2022) 2(1)

DOI: 10.1093/bioadv/vbac055

1Citations

11Readers

Abstract

Summary: While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data.

Cite

CITATION STYLE

APA

Balaban, M., Bristy, N. A., Faisal, A., Bayzid, M. S., & Mirarab, S. (2022). Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. Bioinformatics Advances, 2(1). https://doi.org/10.1093/bioadv/vbac055

Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model

Abstract

Cite

Register to see more suggestions