DistilSum:: Distilling the Knowledge for Extractive Summarization

7Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A popular choice for extractive summarization is to conceptualize it as sentence-level classification, supervised by binary labels. While the common metric ROUGE prefers to measure the text similarity, instead of the performance of classifier. For example, BERTSUMEXT, the best extractive classifier so far, only achieves a precision of 32.9% at the top 3 extracted sentences (P@3) on CNN/DM dataset. It is obvious that current approaches cannot model the complex relationship of sentences exactly with 0/1 targets. In this paper, we introduce DistilSum, which contains teacher mechanism and student model. Teacher mechanism produces high entropy soft targets at a high temperature. Our student model is trained with the same temperature to match these informative soft targets and tested with temperature of 1 to distill for ground-truth labels. Compared with large version of BERTSUMEXT, our experimental result on CNN/DM achieves a substantial improvement of 0.99 ROUGE-L score (text similarity) and 3.95 P@3 score (performance of classifier). Our source code will be available on Github.

Cite

CITATION STYLE

APA

Jia, R., Cao, Y., Shi, H., Fang, F., Liu, Y., & Tan, J. (2020). DistilSum:: Distilling the Knowledge for Extractive Summarization. In International Conference on Information and Knowledge Management, Proceedings (pp. 2069–2072). Association for Computing Machinery. https://doi.org/10.1145/3340531.3412078

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free