A novel sequence-based feature for the identification of DNA-binding sites in proteins using Jensen-Shannon divergence

6Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen-Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC).

References Powered by Scopus

Random forests

96035Citations
N/AReaders
Get full text

The Protein Data Bank

32250Citations
N/AReaders
Get full text

Bagging predictors

19148Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Comprehensive review and empirical analysis of hallmarks of DNA-, RNA-and protein-binding residues in protein chains

89Citations
N/AReaders
Get full text

HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins

10Citations
N/AReaders
Get full text

Information entropy for evaluation of wastewater composition

7Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dang, T. K. L., Meckbach, C., Tacke, R., Waack, S., & Gültas, M. (2016). A novel sequence-based feature for the identification of DNA-binding sites in proteins using Jensen-Shannon divergence. Entropy, 18(10). https://doi.org/10.3390/e18100379

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

80%

Lecturer / Post doc 1

20%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 2

40%

Computer Science 2

40%

Biochemistry, Genetics and Molecular Bi... 1

20%

Save time finding and organizing research with Mendeley

Sign up for free