A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Topic segmentation is the process of dividing a text into semantically coherent segments, and segment labeling involves assigning a topic label to each of these segments. Previous work on this task has included the use of sequence labeling, segment-extraction, and generative models. While these methods have yielded impressive results, existing generative models have struggled to accurately generate strings of segment boundaries, limiting their competitiveness in this area. In this paper, we present a novel Sequence-to-Sequence approach with Mixed Pointers (Seq2Seq-MP). Seq2Seq-MP employs an encoder-decoder architecture with the pointer mechanism to generate both segment boundaries and topics, which allows for a more robust performance than string-generation models and can handle long-range dependencies better than sequence labeling and segment-extraction models. Additionally, we introduce the pairwise type encoding and type-aware relative position encoding to improve the fusion of type and position information, enhancing the interactions between sentences and topics in the encoder and decoder. Our experiments on public datasets show that Seq2Seq-MP outperforms the current state-of-the-art, with up to 2.9% and 4.0% improvements in Pk and F1, respectively.

Cite

CITATION STYLE

APA

Xia, J., & Wang, H. (2023). A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2683–2693). Association for Computing Machinery. https://doi.org/10.1145/3580305.3599245

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free