CLTS: A New Chinese Long Text Summarization Dataset

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present CLTS, a Chinese long text summarization dataset, in order to solve the problem that large-scale and high-quality datasets are scarce in automatic summarization, which is a limitation for further research. To the best of our knowledge, it is the first long text summarization dataset in Chinese. Extracted from the Chinese news website ThePaper.cn (https://www.thepaper.cn/), the corpus contains more than 180,000 Chinese long articles and corresponding summaries written by professional editors and authors, which is available online (CLTS dataset is available to download online at https://github.com/lxj5957/CLTS-Dataset). We train and evaluate several existing methods on CLTS to verify the utility and challenges of the dataset, and the results show that the corpus proposed in this paper is useful to set some baselines to contribute to the further research on automatic text summarization.

Cite

CITATION STYLE

APA

Liu, X., Zhang, C., Chen, X., Cao, Y., & Li, J. (2020). CLTS: A New Chinese Long Text Summarization Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12430 LNAI, pp. 531–542). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60450-9_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free