Detecting motifs in a large data set: Applying probabilistic insights to motif finding

7Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We give a probabilistic algorithm for Consensus Sequence, a NP-complete subproblem of motif recognition, that can be described as follows: given set of l-length sequences, determine if there exists a sequence that has Hamming distance at most d from every sequence. We demonstrate that distance between a randomly selected majority sequence and a consensus sequence decreases as the size of the data set increases. Applying our probabilistic paradigms and insights to motif recognition we develop pMCL-WMR, a program capable of detecting motifs in large synthetic and real-genomic data sets. Our results show that detecting motifs in data sets increases in ease and efficiency when the size of set of sequence increases, a surprising and counter-intuitive fact that has significant impact on this deeply-investigated area. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Boucher, C., & Brown, D. G. (2009). Detecting motifs in a large data set: Applying probabilistic insights to motif finding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5462 LNBI, pp. 139–150). https://doi.org/10.1007/978-3-642-00727-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free