COVER: Conformational oversampling as data augmentation for molecules

30Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Cite

CITATION STYLE

APA

Hemmerich, J., Asilar, E., & Ecker, G. F. (2020). COVER: Conformational oversampling as data augmentation for molecules. Journal of Cheminformatics, 12(1). https://doi.org/10.1186/s13321-020-00420-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free