Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Language model pre-training has led to state-of-the-art performance in text summarization. While a variety of pre-trained transformer models are available nowadays, they are mostly trained on documents. In this study we introduce self-supervised pre-training to enhance the BERT model's semantic and structural understanding of dialog texts from social media. We also propose a semi-supervised teacher-student learning framework to address the common issue of limited available labels in summarization datasets. We empirically evaluate our approach on extractive summarization task with the TWEETSUMM corpus, a recently introduced dialog summarization dataset from Twitter customer care conversations and demonstrate that our self-supervised pre-training and semi-supervised teacher-student learning are both beneficial in comparison to other pre-trained models. Additionally, we compare pre-training and teacher-student learning in various low data-resource settings, and find that pre-training outperforms teacher-student learning and the differences between the two are more significant when the available labels are scarce.

Cite

CITATION STYLE

APA

Zhuang, Y., Song, J., Sadagopan, N., & Beniwal, A. (2023). Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization. In ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023 (pp. 1069–1076). Association for Computing Machinery, Inc. https://doi.org/10.1145/3543873.3587680

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free