COVER: Conformational oversampling as data augmentation for molecules

Jennifer Hemmerich; Ece Asilar; Gerhard F. Ecker

Journal ArticleOPEN ACCESS

COVER: Conformational oversampling as data augmentation for molecules

Journal of Cheminformatics (2020) 12(1)

DOI: 10.1186/s13321-020-00420-z

30Citations

44Readers

Abstract

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Hemmerich, J., Asilar, E., & Ecker, G. F. (2020). COVER: Conformational oversampling as data augmentation for molecules. Journal of Cheminformatics, 12(1). https://doi.org/10.1186/s13321-020-00420-z

COVER: Conformational oversampling as data augmentation for molecules

Abstract

Author supplied keywords

Cite

Register to see more suggestions