Abstract
Background: Enhancers are stretches of DNA (100-1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. Results: The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called EP*2, that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. Conclusions: We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, EP*2, is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim. html.
Author supplied keywords
Cite
CITATION STYLE
Comin, M., & Antonello, M. (2016). On the comparison of regulatory sequences with multiple resolution Entropic Profiles. BMC Bioinformatics, 17(1). https://doi.org/10.1186/s12859-016-0980-2
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.