Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms

30Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Rapid solvent selection is of great significance in chemistry. However, solubility prediction remains a crucial challenge. This study aimed to develop machine learning models that can accurately predict compound solubility in organic solvents. A dataset containing 5081 experimental temperature and solubility data of compounds in organic solvents was extracted and standardized. Molecular fingerprints were selected to characterize structural features. lightGBM was compared with deep learning and traditional machine learning (PLS, Ridge regression, kNN, DT, ET, RF, SVM) to develop models for predicting solubility in organic solvents at different temperatures. Compared to other models, lightGBM exhibited significantly better overall generalization (logS ± 0.20). For unseen solutes, our model gave a prediction accuracy (logS ± 0.59) close to the expected noise level of experimental solubility data. lightGBM revealed the physicochemical relationship between solubility and structural features. Our method enables rapid solvent screening in chemistry and may be applied to solubility prediction in other solvents.

Cite

CITATION STYLE

APA

Ye, Z., & Ouyang, D. (2021). Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms. Journal of Cheminformatics, 13(1). https://doi.org/10.1186/s13321-021-00575-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free