A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling

Jinxiong Xia; Houfeng Wang

Conference ProceedingsOPEN ACCESS

A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2023) 2683-2693

DOI: 10.1145/3580305.3599245

4Citations

10Readers

Get full text

Abstract

Topic segmentation is the process of dividing a text into semantically coherent segments, and segment labeling involves assigning a topic label to each of these segments. Previous work on this task has included the use of sequence labeling, segment-extraction, and generative models. While these methods have yielded impressive results, existing generative models have struggled to accurately generate strings of segment boundaries, limiting their competitiveness in this area. In this paper, we present a novel Sequence-to-Sequence approach with Mixed Pointers (Seq2Seq-MP). Seq2Seq-MP employs an encoder-decoder architecture with the pointer mechanism to generate both segment boundaries and topics, which allows for a more robust performance than string-generation models and can handle long-range dependencies better than sequence labeling and segment-extraction models. Additionally, we introduce the pairwise type encoding and type-aware relative position encoding to improve the fusion of type and position information, enhancing the interactions between sentences and topics in the encoder and decoder. Our experiments on public datasets show that Seq2Seq-MP outperforms the current state-of-the-art, with up to 2.9% and 4.0% improvements in Pk and F1, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Xia, J., & Wang, H. (2023). A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2683–2693). Association for Computing Machinery. https://doi.org/10.1145/3580305.3599245

A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions