Low-quality structural and interaction data improves binding affinity prediction via random forest

86Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

Abstract

Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.

Cite

CITATION STYLE

APA

Li, H., Leung, K. S., Wong, M. H., & Ballester, P. J. (2015). Low-quality structural and interaction data improves binding affinity prediction via random forest. Molecules, 20(6), 10947–10962. https://doi.org/10.3390/molecules200610947

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free