NestedMICA: Sensitive inference of over-represented motifs in nucleic acid sequence

99Citations
Citations of this article
102Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites. © The Author 2005. Published by Oxford University Press. All rights reserved.

Cite

CITATION STYLE

APA

Down, T. A., & Hubbard, T. J. P. (2005). NestedMICA: Sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Research, 33(5), 1445–1453. https://doi.org/10.1093/nar/gki282

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free