A Phase Transition for the Score in Matching Random Sequences Allowing Deletions

  • Arratia R
  • Waterman M
N/ACitations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

We consider a sequence matching problem involvingthe optimal alignment score for contiguoussubsequences, rewarding matches and penalizing fordeletions and mismatches. This score is used bybiologists comparing pairs of DNA or proteinsequences. We prove that for two sequences of lengthn, as n →∞, there is a phase transitionbetween linear growth in n, when the penaltyparameters are small, and logarithmic growth in n,when the penalties are large. The results are validfor independent sequences with iid or Markovletters. The crucial step in proving this is to derivea large deviation result for matching withdeletions. The longest common subsequence problem ofChvatal and Sankoff is a special case of oursetup. The proof of the large deviation resultexploits the Azuma-Hoeffding lemma. The phasetransition is also established for more generalscoring schemes allowing general letter-to-letteralignment penalties and block deletion penalties. Wegive a general method for applying the boundedincrements martingale method to Lipschitz functionalsof Markov processes. The phase transition holds formatching Markov chains and for nonoverlapping repeatsin a single sequence.

Cite

CITATION STYLE

APA

Arratia, R., & Waterman, M. S. (2007). A Phase Transition for the Score in Matching Random Sequences Allowing Deletions. The Annals of Applied Probability, 4(1). https://doi.org/10.1214/aoap/1177005208

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free