Molecular Descriptors Property Prediction Using Transformer-Based Approach

13Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models.

Cite

CITATION STYLE

APA

Tran, T., & Ekenna, C. (2023). Molecular Descriptors Property Prediction Using Transformer-Based Approach. International Journal of Molecular Sciences, 24(15). https://doi.org/10.3390/ijms241511948

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free