Estimating Sequence Similarity from Contig Sets

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A key task in computational biology is to determine mutual similarity of two genomic sequences. Current bio-technologies are usually not able to determine the full sequential content of a genome from biological material, and rather produce a set of large substrings (contigs) whose order and relative mutual positions within the genome are unknown. Here we design a function estimating the sequential similarity (in terms of the inverse Levenshtein distance) of two genomes, given their respective contig-sets. Our approach consists of two steps, based respectively on an adaptation of the tractable Smith-Waterman local alignment algorithm, and a problem reduction to the weighted interval scheduling problem soluble efficiently with dynamic programming. In hierarchical-clustering experiments with Influenza and Hepatitis genomes, our approach outperforms the standard baseline where only the longest contigs are compared. For high-coverage settings, it also outperforms estimates produced by the recent method [8] that avoids contig construction completely.

Cite

CITATION STYLE

APA

Ryšavý, P., & Železný, F. (2017). Estimating Sequence Similarity from Contig Sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10584 LNCS, pp. 272–283). Springer Verlag. https://doi.org/10.1007/978-3-319-68765-0_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free