SIPF: Sampling Method for Inverse Protein Folding

Tianfan Fu; Jimeng Sun

Conference ProceedingsOPEN ACCESS

SIPF: Sampling Method for Inverse Protein Folding

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) 378-388

DOI: 10.1145/3534678.3539284

0Citations

5Readers

Get full text

Abstract

Protein engineering has important applications in drug discovery. Among others, inverse protein folding is a fundamental task in protein design, which aims at generating protein's amino acid sequence given a 3D graph structure. However, most existing methods for inverse protein folding are based on sequential generative models and therefore limited in uncertainty quantification and exploration ability to the entire protein space. To address the issues, we propose a sampling method for inverse protein folding (SIPF). Specifically, we formulate inverse protein folding as a sampling problem and design two pretrained neural networks as Markov Chain Monte Carlo (MCMC) proposal distribution. To ensure sampling efficiency, we further design (i) an adaptive sampling scheme to select variables for sampling and (ii) an approximate target distribution as a surrogate of the unavailable target distribution. Empirical studies have been conducted to validate the effectiveness of SIPF, achieving 7.4% relative improvement on recovery rate and 6.4% relative reduction in perplexity compared to the best baseline.

Author supplied keywords

Cite

CITATION STYLE

APA

Fu, T., & Sun, J. (2022). SIPF: Sampling Method for Inverse Protein Folding. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 378–388). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539284

SIPF: Sampling Method for Inverse Protein Folding

Abstract

Author supplied keywords

Cite

Register to see more suggestions