A popular choice for extractive summarization is to conceptualize it as sentence-level classification, supervised by binary labels. While the common metric ROUGE prefers to measure the text similarity, instead of the performance of classifier. For example, BERTSUMEXT, the best extractive classifier so far, only achieves a precision of 32.9% at the top 3 extracted sentences (P@3) on CNN/DM dataset. It is obvious that current approaches cannot model the complex relationship of sentences exactly with 0/1 targets. In this paper, we introduce DistilSum, which contains teacher mechanism and student model. Teacher mechanism produces high entropy soft targets at a high temperature. Our student model is trained with the same temperature to match these informative soft targets and tested with temperature of 1 to distill for ground-truth labels. Compared with large version of BERTSUMEXT, our experimental result on CNN/DM achieves a substantial improvement of 0.99 ROUGE-L score (text similarity) and 3.95 P@3 score (performance of classifier). Our source code will be available on Github.
CITATION STYLE
Jia, R., Cao, Y., Shi, H., Fang, F., Liu, Y., & Tan, J. (2020). DistilSum:: Distilling the Knowledge for Extractive Summarization. In International Conference on Information and Knowledge Management, Proceedings (pp. 2069–2072). Association for Computing Machinery. https://doi.org/10.1145/3340531.3412078
Mendeley helps you to discover research relevant for your work.