AlGhafa Evaluation Benchmark for Arabic Language Models

14Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent advances in the space of Arabic large language models have opened up a wealth of potential practical applications. From optimal training strategies, large scale data acquisition and continuously increasing NLP resources, the Arabic LLM landscape has improved in a very short span of time, despite being plagued by training data scarcity and limited evaluation resources compared to English. In line with contributing towards this ever-growing field, we introduce AlGhafa, a new multiple-choice evaluation benchmark for Arabic LLMs. For showcasing purposes, we train a new suite of models, including a 14 billion parameter model, the largest monolingual Arabic decoder-only model to date. We use a collection of publicly available datasets, as well as a newly introduced Hand Made dataset consisting of 8 billion tokens. Finally, we explore the quantitative and qualitative toxicity of several Arabic models, comparing our models to existing public Arabic LLMs.

Cite

CITATION STYLE

APA

Almazrouei, E., Cojocaru, R., Baldo, M., Malartic, Q., Alobeidli, H., Mazzotta, D., … Noune, B. (2023). AlGhafa Evaluation Benchmark for Arabic Language Models. In ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings (pp. 244–275). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.arabicnlp-1.21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free