Abstract
Motivation: Gene expression arrays enable measurements of transcription values for a large number or all genes in the genome. In order to better interpret these resluts and to use them to reconstruct transcription networks, information on location of binding sites for regulatory proteins in the entire genome is needed. In particular, this represents an open problem in Escherichia coli. Results: We describe the first implementation of dictionary-style models to the study of transcription factors binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasig number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in E.coli. The results of gene expression arrays are used both to guide the search and corroborate it. © The Author 2004. Published by Oxford University Press. All rights reserved.
Cite
CITATION STYLE
Sabatti, C., Rohlin, L., Lange, K., & Liao, J. C. (2005). Vocabulon: A dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics, 21(7), 922–931. https://doi.org/10.1093/bioinformatics/bti083
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.