Optimal computation of all tandem repeats in a weighted sequence

12Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O(nlogn)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(nlogn) time, thus improving on the best known On2-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

Cite

CITATION STYLE

APA

Barton, C., Iliopoulos, C. S., & Pissis, S. P. (2014). Optimal computation of all tandem repeats in a weighted sequence. Algorithms for Molecular Biology, 9(1). https://doi.org/10.1186/s13015-014-0021-5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free