We propose a new spectral framework for reliable training, scalable inference and interpretable explanation of the DNA repair outcome following a Cas9 cutting. Our framework, dubbed CRISPRLAND, relies on an unexploited observation about the nature of the repair process: the landscape of the DNA repair is highly sparse in the (Walsh-Hadamard) spectral domain. This observation enables our framework to address key shortcomings that limit the interpretability and scaling of current deep-learning-based DNA repair models. In particular, CRISPRLAND reduces the time to compute the full DNA repair landscape from a striking 5230 years to 1 week and the sampling complexity from 1012 to 3 million guide RNAs with only a small loss in accuracy (R2R2 ∼ 0.9). Our proposed framework is based on a divide-and-conquer strategy that uses a fast peeling algorithm to learn the DNA repair models. CRISPRLAND captures lower-degree features around the cut site, which enrich for short insertions and deletions as well as higher-degree microhomology patterns that enrich for longer deletions.
CITATION STYLE
Aghazadeh, A., Ocal, O., & Ramchandran, K. (2020). CRISPRLAND: Interpretable large-scale inference of DNA repair landscape based on a spectral approach. Bioinformatics, 36, I560–I568. https://doi.org/10.1093/BIOINFORMATICS/BTAA505
Mendeley helps you to discover research relevant for your work.