Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization

243Citations
Citations of this article
475Readers
Mendeley users who have this article in their library.

Abstract

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods. Training the hierarchical encoder with these inaccurate labels is challenging. Inspired by the recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose HIBERT (as shorthand for HIerachical Bidirectional Encoder Representations from Transformers) for document encoding and a method to pre-train it using unlabeled data. We apply the pre-trained HIBERT to our summarization model and it outperforms its randomly initialized counterpart by 1.25 ROUGE on the CNN/Dailymail dataset and by 2.0 ROUGE on a version of New York Times dataset. We also achieve the state-of-the-art performance on these two datasets.

Cite

CITATION STYLE

APA

Zhang, X., Wei, F., & Zhou, M. (2020). Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 5059–5069). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1499

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free