Abstract
Background: The distribution and composition of cis -regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets. Methods: Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed to in silico examine their impact on cluster densities and the regulatory states of target genes. Results: We initially chose the glucocorticoid receptor gene ( NR3C1 ), whose regulation has been extensively studied, to test this approach. SLC25A32 and TANK were found to exhibit the most similar expression patterns to NR3C1 . A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation. In-silico mutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions : Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.
Cite
CITATION STYLE
Lu, R., & Rogan, P. K. (2018). Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Research, 7, 1933. https://doi.org/10.12688/f1000research.17363.1
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.