Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities

Heng Zhang; Jianfeng Yan; Zhike Lu; Yangfan Zhou; Qingfeng Zhang; Tingting Cui; Yini Li; Hui Chen; Lijia Ma

Journal ArticleOPEN ACCESS

Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities

Cell Discovery (2023) 9(1)

DOI: 10.1038/s41421-023-00549-9

23Citations

34Readers

Abstract

Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accurately predict gRNA activity and mutational patterns. However, the measurements are inconsistent between studies due to differences in the designs of the gRNA-target pair constructs, and there has not yet been an integrated investigation that concurrently focuses on multiple facets of gRNA capacity. In this study, we analyzed the DNA double-strand break (DSB)-induced repair outcomes and measured SpCas9/gRNA activities at both matched and mismatched locations using 926,476 gRNAs covering 19,111 protein-coding genes and 20,268 non-coding genes. We developed machine learning models to forecast the on-target cleavage efficiency (AIdit_ON), off-target cleavage specificity (AIdit_OFF), and mutational profiles (AIdit_DSB) of SpCas9/gRNA from a uniformly collected and processed dataset by deep sampling and massively quantifying gRNA capabilities in K562 cells. Each of these models exhibited superlative performance in predicting SpCas9/gRNA activities on independent datasets when benchmarked with previous models. A previous unknown parameter was also empirically determined regarding the “sweet spot” in the size of datasets used to establish an effective model to predict gRNA capabilities at a manageable experimental scale. In addition, we observed cell type-specific mutational profiles and were able to link nucleotidylexotransferase as the key factor driving these outcomes. These massive datasets and deep learning algorithms have been implemented into the user-friendly web service http://crispr-aidit.com to evaluate and rank gRNAs for life science studies.

Cite

CITATION STYLE

APA

Zhang, H., Yan, J., Lu, Z., Zhou, Y., Zhang, Q., Cui, T., … Ma, L. (2023). Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities. Cell Discovery, 9(1). https://doi.org/10.1038/s41421-023-00549-9

Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities

Abstract

Cite

Register to see more suggestions