Abstract
Molecular pre-training, which is about to learn an effective representation for molecules on large amount of data, has attracted substantial attention in cheminformatics and bioinformatics. A molecule can be viewed as either a graph (where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). The Transformer and graph neural networks (GNN) are two representative methods to deal with the sequential data and the graphic data, which can globally and locally model the molecules respectively and are supposed to be complementary. In this work, we propose to leverage both representations and design a new pre-training algorithm, dual-view molecule pre-training (briefly, DVMP), that can effectively combine the strengths of both types of molecule representations. DVMP has a Transformer branch and a GNN branch, and the two branches are pre-trained to maintain the semantic consistency of molecules. After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks. DVMP is tested on 11 molecular property prediction tasks and outperforms strong baselines. Furthermore, we test DVMP on three retrosynthesis tasks and it achieves state-of-the-art results. Our code is released at https://github.com/microsoft/DVMP.
Author supplied keywords
Cite
CITATION STYLE
Zhu, J., Xia, Y., Wu, L., Xie, S., Zhou, W., Qin, T., … Liu, T. Y. (2023). Dual-view Molecular Pre-training. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3615–3627). Association for Computing Machinery. https://doi.org/10.1145/3580305.3599317
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.