Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

12Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

Cite

CITATION STYLE

APA

Wang, X., Lin, P., & Ho, J. W. K. (2018). Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest. BMC Genomics, 19. https://doi.org/10.1186/s12864-017-4340-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free