Genome sequencing has allowed many gene regulatory elements to be identified through cross-species comparisons [1-5]. However, the expression of genes in multigene families can diverge rapidly between related species [6-9]. An alternative approach to characterizing multigene families utilizes the fact that genes within the group are likely to share aspects of their regulation. Here, we use a statistical approach, probabilistic segmentation , to identify sequences that are overrepresented in the regions upstream of C. elegans chemosensory receptor genes. Although each of these elements is present in only a subset of the genes, their distribution across and within the promoters of chemosensory receptor genes makes it possible to detect them. Many of the motifs show positional preference with respect to the translational start site and correspond to the binding sites of known families of transcription factors. We verified one motif, the E-box sequence WWYCACSTGYY, by showing that it directs expression of reporter genes to the ADL chemosensory neurons. Thus, probabilistic segmentation can be used to identify functional regulatory elements with no previous knowledge of gene expression or regulation. This approach may be of particular value for rapidly evolving genes in the immune system and the nervous system.
McCarroll, S. A., Li, H., & Bargmann, C. I. (2005). Identification of transcriptional regulatory elements in chemosensory receptor genes by probabilistic segmentation. Current Biology, 15(4), 347–352. https://doi.org/10.1016/j.cub.2005.02.023