Abstract
Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence. © The Author 2011. Published by Oxford University Press. All rights reserved.
Cite
CITATION STYLE
Cuellar-Partida, G., Buske, F. A., McLeay, R. C., Whitington, T., Noble, W. S., & Bailey, T. L. (2012). Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics, 28(1), 56–62. https://doi.org/10.1093/bioinformatics/btr614
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.