Efficient estimation of pairwise distances between genomes

40Citations
Citations of this article
97Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Genome comparison is central to contemporary genomics and typically relies on sequence alignment. However, genome-wide alignments are difficult to compute. We have, therefore, recently developed an accurate alignment-free estimator of the number of substitutions per site based on the lengths of exact matches between pairs of sequences. The previous implementation of this measure requires n(n-1) suffix tree constructions and traversals, where n is the number of sequences analyzed. This does not scale well for large n. Results: We present an algorithm to extract (n2) pairwise distances in a single traversal of a single suffix tree containing n sequences. As a result, the run time of the suffix tree construction phase of our algorithm is reduced from O(n2L) to O(nL), where L is the length of each sequence. We implement this algorithm in the program kr version 2 and apply it to 825 HIV genomes, 13 genomes of enterobacteria and the complete genomes of 12 Drosophila species. We show that, depending on the input dataset, the new program is at least 10 times faster than its predecessor. © The Author 2009. Published by Oxford University Press.

References Powered by Scopus

MUSCLE: Multiple sequence alignment with high accuracy and high throughput

35955Citations
N/AReaders
Get full text

Clustal W and Clustal X version 2.0

24745Citations
N/AReaders
Get full text

MAFFT version 5: Improvement in accuracy of multiple sequence alignment

4099Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Identification of antimicrobial peptides from the human gut microbiome using deep learning

254Citations
N/AReaders
Get full text

Kmacs: The k-mismatch average common substring approach to alignment-free sequence comparison

98Citations
N/AReaders
Get full text

A novel hierarchical clustering algorithm for gene sequences

86Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Domazet-Lošo Mirjana, M., & Haubold, B. (2009). Efficient estimation of pairwise distances between genomes. Bioinformatics, 25(24), 3221–3227. https://doi.org/10.1093/bioinformatics/btp590

Readers over time

‘09‘10‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘24‘25010203040

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 41

49%

Researcher 23

28%

Professor / Associate Prof. 18

22%

Lecturer / Post doc 1

1%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 44

52%

Computer Science 20

24%

Biochemistry, Genetics and Molecular Bi... 18

21%

Mathematics 2

2%

Save time finding and organizing research with Mendeley

Sign up for free
0