A Systematic Survey of Chemical Pre-trained Models

39Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep learning has achieved remarkable success in learning representations for molecules, which is crucial for various biochemical applications, ranging from property prediction to drug design. However, training Deep Neural Networks (DNNs) from scratch often requires abundant labeled molecules, which are expensive to acquire in the real world. To alleviate this issue, tremendous efforts have been devoted to Chemical Pre-trained Models (CPMs), where DNNs are pre-trained using large-scale unlabeled molecular databases and then fine-tuned over specific downstream tasks. Despite the prosperity, there lacks a systematic review of this fast-growing field. In this paper, we present the first survey that summarizes the current progress of CPMs. We first highlight the limitations of training molecular representation models from scratch to motivate CPM studies. Next, we systematically review recent advances on this topic from several key perspectives, including molecular descriptors, encoder architectures, pre-training strategies, and applications. We also highlight the challenges and promising avenues for future research, providing a useful resource for both machine learning and scientific communities.

Cite

CITATION STYLE

APA

Xia, J., Zhu, Y., Du, Y., & Li, S. Z. (2023). A Systematic Survey of Chemical Pre-trained Models. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2023-August, pp. 6787–6795). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2023/760

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free