Protein language models (pLMs) trained on a large corpus of protein sequences ha v e sho wn unprecedented scalability and broad generaliz- ability in a wide range of predictive modeling tasks, but their po w er has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNA S , a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry -a w are deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study re v eals that the pLM embeddings used in EquiPNAS are sufficiently po w erful to dramatically re- duce the dependence on the a v ailability of e v olutionary inf ormation without compromising on accuracy, and that the symmetry -a w are nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely a v ailable at https:// github.com/ Bhattacharya-Lab/ EquiPNAS .
CITATION STYLE
Roche, R., Moussad, B., Shuvo, M. H., Tarafder, S., & Bhattacharya, D. (2024). EquiPNA S: impro v ed prot ein-nucleic acid binding site prediction using prot ein-languag e-model-inf ormed equiv ar iant deep graph neural networks. Nucleic Acids Research, 52(5), E27. https://doi.org/10.1093/nar/gkae039
Mendeley helps you to discover research relevant for your work.