Analysis of protein features and machine learning algorithms for prediction of druggable proteins

14Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

Background: Computational tools have been widely used in drug discovery process since they reduce the time and cost. Prediction of whether a protein is druggable is fundamental and crucial for drug research pipeline. Sequence based protein function prediction plays vital roles in many research areas. Training data, protein features selection and machine learning algorithms are three indispensable elements that drive the successfulness of the models. Methods: In this study, we tested the performance of different combinations of protein features and machine learning algorithms, based on FDA-approved small molecules’ targets, in druggable proteins prediction.We also enlarged the dataset to include the targets of small molecules that were in experiment or clinical investigation. Results: We found that although the 146-d vector used by Li et al. with neuron network achieved the best training accuracy of 91.10%, overlapped 3-gram word2vec with logistic regression achieved best prediction accuracy on independent test set (89.55%) and on newly approved-targets. Enlarged dataset with targets of small molecules in experiment and clinical investigation were trained. Unfortunately, the best training accuracy was only 75.48%. In addition, we applied our models to predict potential targets for references in future study. Conclusions: Our study indicates the potential ability of word2vec in the prediction of druggable protein. And the training dataset of druggable protein should not be extended to targets that are lack of verification. The target prediction package could be found on https://doi.org/github.com/pkumdl/target_prediction. [Figure not available: see fulltext.].

Cite

CITATION STYLE

APA

Sun, T., Lai, L., & Pei, J. (2018). Analysis of protein features and machine learning algorithms for prediction of druggable proteins. Quantitative Biology, 6(4), 334–343. https://doi.org/10.1007/s40484-018-0157-2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free