The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated on a limited set of pre-defined aspects, focused solely on single document inputs, or relied on synthetic data. To advance research on more realistic scenarios, we introduce OPENASP, a benchmark for multi-document open aspect-based summarization. This benchmark is created using a novel and cost-effective annotation protocol, by which an open aspect dataset is derived from existing generic multi-document summarization datasets. We analyze the properties of OPENASP showcasing its high-quality content. Further, we show that the realistic open-aspect setting realized in OPENASP poses a challenge for current state-of-the-art summarization models, as well as for large language models.
CITATION STYLE
Amar, S., Schiff, L., Ernst, O., Shefer, A., Shapira, O., & Dagan, I. (2023). OPENASP: A Benchmark for Multi-document Open Aspect-based Summarization. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1967–1991). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.121
Mendeley helps you to discover research relevant for your work.