Abstract
Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.
Author supplied keywords
Cite
CITATION STYLE
Hemmerich, J., Asilar, E., & Ecker, G. F. (2020). COVER: Conformational oversampling as data augmentation for molecules. Journal of Cheminformatics, 12(1). https://doi.org/10.1186/s13321-020-00420-z
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.