Dual-view Molecular Pre-training

Jinhua Zhu; Yingce Xia; Lijun Wu; Shufang Xie; Wengang Zhou; Tao Qin; Houqiang Li; Tie Yan Liu

Conference ProceedingsOPEN ACCESS

Dual-view Molecular Pre-training

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2023) 3615-3627

DOI: 10.1145/3580305.3599317

52Citations

19Readers

Get full text

Abstract

Molecular pre-training, which is about to learn an effective representation for molecules on large amount of data, has attracted substantial attention in cheminformatics and bioinformatics. A molecule can be viewed as either a graph (where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). The Transformer and graph neural networks (GNN) are two representative methods to deal with the sequential data and the graphic data, which can globally and locally model the molecules respectively and are supposed to be complementary. In this work, we propose to leverage both representations and design a new pre-training algorithm, dual-view molecule pre-training (briefly, DVMP), that can effectively combine the strengths of both types of molecule representations. DVMP has a Transformer branch and a GNN branch, and the two branches are pre-trained to maintain the semantic consistency of molecules. After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks. DVMP is tested on 11 molecular property prediction tasks and outperforms strong baselines. Furthermore, we test DVMP on three retrosynthesis tasks and it achieves state-of-the-art results. Our code is released at https://github.com/microsoft/DVMP.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, J., Xia, Y., Wu, L., Xie, S., Zhou, W., Qin, T., … Liu, T. Y. (2023). Dual-view Molecular Pre-training. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3615–3627). Association for Computing Machinery. https://doi.org/10.1145/3580305.3599317

Dual-view Molecular Pre-training

Abstract

Author supplied keywords

Cite

Register to see more suggestions