DistilSum:: Distilling the Knowledge for Extractive Summarization

Ruipeng Jia; Yanan Cao; Haichao Shi; Fang Fang; Yanbing Liu; Jianlong Tan

Conference ProceedingsOPEN ACCESS

DistilSum:: Distilling the Knowledge for Extractive Summarization

International Conference on Information and Knowledge Management, Proceedings (2020) 2069-2072

DOI: 10.1145/3340531.3412078

7Citations

10Readers

Get full text

Abstract

A popular choice for extractive summarization is to conceptualize it as sentence-level classification, supervised by binary labels. While the common metric ROUGE prefers to measure the text similarity, instead of the performance of classifier. For example, BERTSUMEXT, the best extractive classifier so far, only achieves a precision of 32.9% at the top 3 extracted sentences (P@3) on CNN/DM dataset. It is obvious that current approaches cannot model the complex relationship of sentences exactly with 0/1 targets. In this paper, we introduce DistilSum, which contains teacher mechanism and student model. Teacher mechanism produces high entropy soft targets at a high temperature. Our student model is trained with the same temperature to match these informative soft targets and tested with temperature of 1 to distill for ground-truth labels. Compared with large version of BERTSUMEXT, our experimental result on CNN/DM achieves a substantial improvement of 0.99 ROUGE-L score (text similarity) and 3.95 P@3 score (performance of classifier). Our source code will be available on Github.

Author supplied keywords

Cite

CITATION STYLE

APA

Jia, R., Cao, Y., Shi, H., Fang, F., Liu, Y., & Tan, J. (2020). DistilSum:: Distilling the Knowledge for Extractive Summarization. In International Conference on Information and Knowledge Management, Proceedings (pp. 2069–2072). Association for Computing Machinery. https://doi.org/10.1145/3340531.3412078

DistilSum:: Distilling the Knowledge for Extractive Summarization

Abstract

Author supplied keywords

Cite

Register to see more suggestions