AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization

21Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focus on English, Arabic remains understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART (Lewis et al., 2020). We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model, multilingual BART, Arabic T5, and a multilingual T5 model. AraBART is publicly available on github and the Hugging Face model hub.

Cite

CITATION STYLE

APA

Eddine, M. K., Tomeh, N., Habash, N., Le Roux, J., & Vazirgiannis, M. (2022). AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. In WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (pp. 31–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.wanlp-1.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free