G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data

11Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

G-quadruplexes are non-B-DNA structures that form in the genome facilitated by Hoogsteen bonds between guanines in single or multiple strands of DNA. The functions of G-quadruplexes are linked to various molecular and disease phenotypes, and thus researchers are interested in measuring G-quadruplex formation genome-wide. Experimentally measuring G-quadruplexes is a long and laborious process. Computational prediction of G-quadruplex propensity from a given DNA sequence is thus a long-standing challenge. Unfortunately, despite the availability of high-throughput datasets measuring G-quadruplex propensity in the form of mismatch scores, extant methods to predict G-quadruplex formation either rely on small datasets or are based on domain-knowledge rules. We developed G4mismatch, a novel algorithm to accurately and efficiently predict G-quadruplex propensity for any genomic sequence. G4mismatch is based on a convolutional neural network trained on almost 400 millions human genomic loci measured in a single G4-seq experiment. When tested on sequences from a held-out chromosome, G4mismatch, the first method to predict mismatch scores genome-wide, achieved a Pearson correlation of over 0.8. When benchmarked on independent datasets derived from various animal species, G4mismatch trained on human data predicted G-quadruplex propensity genome-wide with high accuracy (Pearson correlations greater than 0.7). Moreover, when tested in detecting G-quadruplexes genome-wide using the predicted mismatch scores, G4mismatch achieved superior performance compared to extant methods. Last, we demonstrate the ability to deduce the mechanism behind G-quadruplex formation by unique visualization of the principles learned by the model.

Cite

CITATION STYLE

APA

Barshai, M., Engel, B., Haim, I., & Orenstein, Y. (2023). G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data. PLoS Computational Biology, 19(3). https://doi.org/10.1371/journal.pcbi.1010948

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free