Transformer-based protein generation with regularized latent space optimization

Egbert Castro; Abhinav Godavarthi; Julian Rubinfien; Kevin Givechian; Dhananjay Bhaskar; Smita Krishnaswamy

Journal ArticleOPEN ACCESS

Transformer-based protein generation with regularized latent space optimization

Nature Machine Intelligence (2022) 4(10) 840-851

DOI: 10.1038/s42256-022-00532-1

51Citations

118Readers

Get full text

Abstract

The development of powerful natural language models has improved the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder, which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and a novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence–function landscape of large labelled datasets and generate new molecules by optimizing within the latent space using gradient-based methods. We evaluate this approach on several publicly available protein datasets, including variant sets of anti-ranibizumab and green fluorescent protein. We observe a greater sequence optimization efficiency (increase in fitness per optimization step) using ReLSO compared with other approaches, where ReLSO more robustly generates high-fitness sequences. Furthermore, the attention-based relationships learned by the jointly trained ReLSO models provide a potential avenue towards sequence-level fitness attribution information.

Cite

CITATION STYLE

APA

Castro, E., Godavarthi, A., Rubinfien, J., Givechian, K., Bhaskar, D., & Krishnaswamy, S. (2022). Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence, 4(10), 840–851. https://doi.org/10.1038/s42256-022-00532-1

Transformer-based protein generation with regularized latent space optimization

Abstract

Cite

Register to see more suggestions