A model stacking framework for identifying DNA binding proteins by orchestrating multi-view features and classifiers

25Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.

References Powered by Scopus

Random forests

96834Citations
N/AReaders
Get full text

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

63412Citations
N/AReaders
Get full text

Support-Vector Networks

46371Citations
N/AReaders
Get full text

Cited by Powered by Scopus

StackCPPred: A stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency

122Citations
N/AReaders
Get full text

Mk-fsvm-svdd: A multiple kernel-based fuzzy svm model for predicting dna-binding proteins via support vector data description

113Citations
N/AReaders
Get full text

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information

82Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Liu, X. J., Gong, X. J., Yu, H., & Xu, J. H. (2018). A model stacking framework for identifying DNA binding proteins by orchestrating multi-view features and classifiers. Genes, 9(8). https://doi.org/10.3390/genes9080394

Readers' Seniority

Tooltip

Professor / Associate Prof. 2

33%

PhD / Post grad / Masters / Doc 2

33%

Lecturer / Post doc 1

17%

Researcher 1

17%

Readers' Discipline

Tooltip

Computer Science 3

43%

Biochemistry, Genetics and Molecular Bi... 2

29%

Materials Science 1

14%

Engineering 1

14%

Save time finding and organizing research with Mendeley

Sign up for free