Hydration free energies from kernel-based machine learning: Compound-database bias

25Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the prediction of a basic thermodynamic property - hydration free energies - across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of a narrow chemical range.

Cite

CITATION STYLE

APA

Rauer, C., & Bereau, T. (2020). Hydration free energies from kernel-based machine learning: Compound-database bias. Journal of Chemical Physics, 153(1). https://doi.org/10.1063/5.0012230

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free