We present a large-scale 26,000-lemma leveled readability lexicon for Modern Standard Arabic. The lexicon was manually annotated in triplicate by language professionals from three regions in the Arab world. The annotations show a high degree of agreement; and major differences were limited to regional variations. Comparing lemma readability levels with their frequencies provided good insights in the benefits and pitfalls of frequency-based readability approaches. The lexicon will be publicly available.
Al Khalil, M., Habash, N., & Jiang, Z. (2020). A large-scale leveled readability lexicon for standard Arabic. In LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp. 3053–3062). European Language Resources Association (ELRA).