PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction

58Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide’s potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.

Cite

CITATION STYLE

APA

Guntuboina, C., Das, A., Mollaei, P., Kim, S., & Barati Farimani, A. (2023). PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction. Journal of Physical Chemistry Letters, 14(46), 10427–10434. https://doi.org/10.1021/acs.jpclett.3c02398

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free