Hydration free energies from kernel-based machine learning: Compound-database bias

Clemens Rauer; Tristan Bereau

Journal ArticleOPEN ACCESS

Hydration free energies from kernel-based machine learning: Compound-database bias

Journal of Chemical Physics (2020) 153(1)

DOI: 10.1063/5.0012230

27Citations

33Readers

Get full text

Abstract

We consider the prediction of a basic thermodynamic property - hydration free energies - across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of a narrow chemical range.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Rauer, C., & Bereau, T. (2020). Hydration free energies from kernel-based machine learning: Compound-database bias. Journal of Chemical Physics, 153(1). https://doi.org/10.1063/5.0012230

Readers' Seniority

PhD / Post grad / Masters / Doc 15

65%

Researcher 5

22%

Professor / Associate Prof. 2

Lecturer / Post doc 1

Readers' Discipline

Chemistry 10

56%

Computer Science 3

17%

Physics and Astronomy 3

17%

Materials Science 2

11%

Hydration free energies from kernel-based machine learning: Compound-database bias

Abstract

References Powered by Scopus

Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Electrostatics of nanosystems: Application to microtubules and the ribosome

G-mmpbsa -A GROMACS tool for high-throughput MM-PBSA calculations

Cited by Powered by Scopus

Machine Learning for Chemical Reactions

Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model

Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline