Detecting motifs in a large data set: Applying probabilistic insights to motif finding

Christina Boucher; Daniel G. Brown

Conference Proceedings

Detecting motifs in a large data set: Applying probabilistic insights to motif finding

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5462 LNBI 139-150

DOI: 10.1007/978-3-642-00727-9_15

7Citations

7Readers

Get full text

Abstract

We give a probabilistic algorithm for Consensus Sequence, a NP-complete subproblem of motif recognition, that can be described as follows: given set of l-length sequences, determine if there exists a sequence that has Hamming distance at most d from every sequence. We demonstrate that distance between a randomly selected majority sequence and a consensus sequence decreases as the size of the data set increases. Applying our probabilistic paradigms and insights to motif recognition we develop pMCL-WMR, a program capable of detecting motifs in large synthetic and real-genomic data sets. Our results show that detecting motifs in data sets increases in ease and efficiency when the size of set of sequence increases, a surprising and counter-intuitive fact that has significant impact on this deeply-investigated area. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Boucher, C., & Brown, D. G. (2009). Detecting motifs in a large data set: Applying probabilistic insights to motif finding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5462 LNBI, pp. 139–150). https://doi.org/10.1007/978-3-642-00727-9_15

Detecting motifs in a large data set: Applying probabilistic insights to motif finding

Abstract

Cite

Register to see more suggestions