GDNorm: An improved poisson regression model for reducing biases in Hi-C data

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As a revolutionary tool, the Hi-C technology can be used to capture genomic segments that have close spatial proximity in three dimensional space and enable the study of chromosome structures at an unprecedentedly high throughput and resolution. However, during the experimental steps of Hi-C, systematic biases from different sources are often introduced into the resultant data (i.e., reads or read counts). Several bias reduction methods have been proposed recently. Although both systematic biases and spatial distance are known as key factors determining the number of observed chromatin interactions, the existing bias reduction methods in the literature do not include spatial distance explicitly in their computational models for estimating the interactions. In this work, we propose an improved Poisson regression model and an efficient gradient descent based algorithm, GDNorm, for reducing biases in Hi-C data that takes spatial distance into consideration. GDNorm has been tested on both simulated and real Hi-C data, and its performance compared with that of the state-of-the-art bias reduction methods. The experimental results show that our improved Poisson model is able to provide more accurate normalized contact frequencies (measured in read counts) between interacting genomic segments and thus a more accurate chromosome structure prediction when combined with a chromosome structure determination method such as ChromSDE. Moreover, assessed by recently published data from human lymphoblastoid and mouse embryonic stem cell lines, GDNorm achieves the highest reproducibility between the biological replicates of the cell lines. The normalized contact frequencies obtained by GDNorm is well correlated to the spatial distance measured by florescent in situ hybridization (FISH) experiments. In addition to accurate bias reduction, GDNorm has the highest time efficiency on the real data. GDNorm is implemented in C++ and available at http://www.cs.ucr.edu/∼yyang027/gdnorm.htm © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Yang, E. W., & Jiang, T. (2014). GDNorm: An improved poisson regression model for reducing biases in Hi-C data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8701 LNBI, pp. 263–280). Springer Verlag. https://doi.org/10.1007/978-3-662-44753-6_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free