Identifying cis-regulatory sequences by word profile similarity

22Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

Abstract

Background: Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. Methodology/Principal Findings: We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila. Conclusions/Significance: Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz.

References Powered by Scopus

Genome-wide mapping of in vivo protein-DNA interactions

2224Citations
N/AReaders
Get full text

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome

492Citations
N/AReaders
Get full text

Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm

384Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Discriminative prediction of mammalian enhancers from DNA sequence

207Citations
N/AReaders
Get full text

Progress and challenges in bioinformatics approaches for enhancer identification

70Citations
N/AReaders
Get full text

Alignment-free sequence comparison based on next-generation sequencing reads

67Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Leung, G., & Eisen, M. B. (2009). Identifying cis-regulatory sequences by word profile similarity. PLoS ONE, 4(9). https://doi.org/10.1371/journal.pone.0006901

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

49%

Researcher 15

31%

Professor / Associate Prof. 10

20%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 36

71%

Computer Science 9

18%

Biochemistry, Genetics and Molecular Bi... 5

10%

Materials Science 1

2%

Save time finding and organizing research with Mendeley

Sign up for free