Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
CITATION STYLE
Loh, C., Christensen, T., Dangovski, R., Kim, S., & Soljačić, M. (2022). Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science. Nature Communications, 13(1). https://doi.org/10.1038/s41467-022-31915-y
Mendeley helps you to discover research relevant for your work.