Abstract
Topic segmentation is the process of dividing a text into semantically coherent segments, and segment labeling involves assigning a topic label to each of these segments. Previous work on this task has included the use of sequence labeling, segment-extraction, and generative models. While these methods have yielded impressive results, existing generative models have struggled to accurately generate strings of segment boundaries, limiting their competitiveness in this area. In this paper, we present a novel Sequence-to-Sequence approach with Mixed Pointers (Seq2Seq-MP). Seq2Seq-MP employs an encoder-decoder architecture with the pointer mechanism to generate both segment boundaries and topics, which allows for a more robust performance than string-generation models and can handle long-range dependencies better than sequence labeling and segment-extraction models. Additionally, we introduce the pairwise type encoding and type-aware relative position encoding to improve the fusion of type and position information, enhancing the interactions between sentences and topics in the encoder and decoder. Our experiments on public datasets show that Seq2Seq-MP outperforms the current state-of-the-art, with up to 2.9% and 4.0% improvements in Pk and F1, respectively.
Author supplied keywords
Cite
CITATION STYLE
Xia, J., & Wang, H. (2023). A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2683–2693). Association for Computing Machinery. https://doi.org/10.1145/3580305.3599245
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.