Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

33Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent advancements in high-quality, large-scale English resources have pushed the frontier of English Automatic Text Simplification (ATS) research. However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages. This paper introduces the MULTISIM benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. This benchmark will encourage research in developing more effective multilingual text simplification models and evaluation metrics. Our experiments using MULTISIM with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings. We observe strong performance from Russian in zero-shot crosslingual transfer to low-resource languages. We further show that few-shot prompting with BLOOM-176b achieves comparable quality to reference simplifications outperforming finetuned models in most languages. We validate these findings through human evaluation.

Cite

CITATION STYLE

APA

Ryan, M. J., Naous, T., & Xu, W. (2023). Revisiting non-English Text Simplification: A Unified Multilingual Benchmark. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4898–4927). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.269

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free