Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network

Elham Khalili; Shahin Ramazi; Faezeh Ghanati; Samaneh Kouchaki

Journal ArticleOPEN ACCESS

Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network

Briefings in Bioinformatics (2022) 23(2)

DOI: 10.1093/bib/bbac015

23Citations

22Readers

Abstract

Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.

Author supplied keywords

Cite

CITATION STYLE

APA

Khalili, E., Ramazi, S., Ghanati, F., & Kouchaki, S. (2022). Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network. Briefings in Bioinformatics, 23(2). https://doi.org/10.1093/bib/bbac015

Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network

Abstract

Author supplied keywords

Cite

Register to see more suggestions