A novel sequence-based feature for the identification of DNA-binding sites in proteins using Jensen-Shannon divergence

5Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen-Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC).

Cite

CITATION STYLE

APA

Dang, T. K. L., Meckbach, C., Tacke, R., Waack, S., & Gültas, M. (2016). A novel sequence-based feature for the identification of DNA-binding sites in proteins using Jensen-Shannon divergence. Entropy, 18(10). https://doi.org/10.3390/e18100379

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free