BiRNA-BERT allows efficient RNA language modeling with adaptive tokenization

3Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Transformer-based models have achieved remarkable success in biological sequence modeling, yet their application to RNA remains constrained by sequence length limitations. Existing RNA language models often truncate inputs, discarding distal nucleotide context crucial for full-length tasks. Additionally, advanced NLP tokenization methods do not directly apply to biological sequences, where nucleotide-level resolution is essential for tasks like secondary structure prediction. To address these challenges, we introduce BiRNA-BERT, a 117M-parameter Transformer encoder trained on 36 million non-coding RNA sequences. At its core is an adaptive dual-tokenization framework that combines nucleotide-level (NUC) encoding for fine-grained structural tasks with byte-pair encoding (BPE) for efficient long-sequence processing. BiRNA-BERT dynamically selects tokenization based on input length, enabling it to process arbitrarily long sequences without truncation. We demonstrate state-of-the-art performance across tasks ranging from short-sequence classification to long-context modeling and fine-grained nucleotide level RNA structural prediction. Our information-theoretic analysis reveals the trade-offs between BPE compression and NUC tokenization, which we again validate empirically. Finally, BiRNA-BERT achieves strong intrinsic language modeling performance–measured by perplexity and token recovery–while remaining more compact than existing RNA models. The code and model weights are available at https://github.com/buetnlpbio/BiRNA-BERT.

Cite

CITATION STYLE

APA

Tahmid, M. T., Shahgir, H. S., Mahbub, S., Dong, Y., & Bayzid, M. S. (2025). BiRNA-BERT allows efficient RNA language modeling with adaptive tokenization. Communications Biology, 8(1). https://doi.org/10.1038/s42003-025-08982-0

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free