Can we Pretrain a SotA Legal Language Model on a Budget From Scratch?

1Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Even though many efficient transformers have been proposed, only few such models are available for specialized domains. Additionally, since the pretraining process is extremely costly in general - but even more so as the sequence length increases - it is often only in reach of large research labs. One way of making pretraining cheaper is the Replaced Token Detection (RTD) task, by providing more signal during training compared to MLM, since the loss can be computed over all tokens. In this work, we train Longformer models with the efficient RTD task on long-context legal data to showcase that pretraining efficient LMs is possible using less than 12 GPU days. We evaluate the trained models on challenging summarization tasks requiring the model to summarize complex long texts. We find that both the small and base models outperform their baselines on the in-domain BillSum and out-of-domain PubMed tasks in their respective parameter range. We publish our models as a resource for researchers and practitioners.

Cite

CITATION STYLE

APA

Niklaus, J., & Giofré, D. (2023). Can we Pretrain a SotA Legal Language Model on a Budget From Scratch? In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 158–182). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sustainlp-1.11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free