A nearsighted force-training approach to systematically generate training data for the machine learning of large atomic structures

6Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A challenge of atomistic machine-learning (ML) methods is ensuring that the training data are suitable for the system being simulated, which is particularly challenging for systems with large numbers of atoms. Most atomistic ML approaches rely on the nearsightedness principle ("all chemistry is local"), using information about the position of an atom's neighbors to predict a per-atom energy. In this work, we develop a framework that exploits the nearsighted nature of ML models to systematically produce an appropriate training set for large structures. We use a per-atom uncertainty estimate to identify the most uncertain atoms and extract chunks centered around these atoms. It is crucial that these small chunks are both large enough to satisfy the ML's nearsighted principle (that is, filling the cutoff radius) and are large enough to be converged with respect to the electronic structure calculation. We present data indicating when the electronic structure calculations are converged with respect to the structure size, which fundamentally limits the accuracy of any nearsighted ML calculator. These new atomic chunks are calculated in electronic structures, and crucially, only a single force-that of the central atom-is added to the growing training set, preventing the noisy and irrelevant information from the piece's boundary from interfering with ML training. The resulting ML potentials are robust, despite requiring single-point calculations on only small reference structures and never seeing large training structures. We demonstrated our approach via structure optimization of a 260-atom structure and extended the approach to clusters with up to 1415 atoms.

Cite

CITATION STYLE

APA

Zeng, C., Chen, X., & Peterson, A. A. (2022). A nearsighted force-training approach to systematically generate training data for the machine learning of large atomic structures. Journal of Chemical Physics, 156(6). https://doi.org/10.1063/5.0079314

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free