Multi-XScience: A large-scale dataset for extreme multi-document summarization of scientific articles

92Citations
Citations of this article
142Readers
Mendeley users who have this article in their library.

Abstract

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results-using several state-of-the-art models trained on the Multi-XScience dataset-reveal that Multi-XScience is well suited for abstractive models.

Cite

CITATION STYLE

APA

Lu, Y., Dong, Y., & Charlin, L. (2020). Multi-XScience: A large-scale dataset for extreme multi-document summarization of scientific articles. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 8068–8074). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-main.648

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free