Abstract
Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.
Cite
CITATION STYLE
Xiao, W., Huber, P., & Carenini, G. (2021). Predicting Discourse Trees from Transformer-based Neural Summarizers. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 4139–4152). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.436
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.