Leveraging protein language models for cross-variant CRISPR/Cas9 sgRNA activity prediction

4Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Accurate prediction of single-guide RNA (sgRNA) activity is crucial for optimizing the CRISPR/Cas9 gene-editing system, as it directly influences the efficiency and accuracy of genome modifications. However, existing prediction methods mainly rely on large-scale experimental data of a single Cas9 variant to construct Cas9 protein (variants)-specific sgRNA activity prediction models, which limits their generalization ability and prediction performance across different Cas9 protein (variants), as well as their scalability to the continuously discovered new variants. Results: In this study, we proposed PLM-CRISPR, a novel deep learning-based model that leverages protein language models to capture Cas9 protein (variants) representations for cross-variant sgRNA activity prediction. PLM-CRISPR uses tailored feature extraction modules for both sgRNA and protein sequences, incorporating a cross-variant training strategy and a dynamic feature fusion mechanism to effectively model their interactions. Extensive experiments demonstrate that PLM-CRISPR outperforms existing methods across datasets spanning seven Cas9 protein (variants) in three real-world scenarios, demonstrating its superior performance in handling data-scarce situations, including cases with few or no samples for novel variants. Comparative analyses with traditional machine learning and deep learning models further confirm the effectiveness of PLM-CRISPR. Additionally, motif analysis reveals that PLM-CRISPR accurately identifies high-activity sgRNA sequence patterns across diverse Cas9 protein (variants). Overall, PLM-CRISPR provides a robust, scalable, and generalizable solution for sgRNA activity prediction across diverse Cas9 protein (variants).

Cite

CITATION STYLE

APA

Hou, Y., Li, Y., Zheng, R., Zhang, F., Guo, F., Li, M., & Zeng, M. (2025). Leveraging protein language models for cross-variant CRISPR/Cas9 sgRNA activity prediction. Bioinformatics, 41(7). https://doi.org/10.1093/bioinformatics/btaf385

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free