SEGBOT: A generic neural text segmentation model with pointer network

88Citations
Citations of this article
109Readers
Mendeley users who have this article in their library.

Abstract

Text segmentation is a fundamental task in natural language processing that comes in two levels of granularity: (i) segmenting a document into a sequence of topical segments (topic segmentation), and (ii) segmenting a sentence into a sequence of elementary discourse units (EDU segmentation). Traditional solutions to the two tasks heavily rely on carefully designed features. The recently proposed neural models do not need manual feature engineering, but they either suffer from sparse boundary tags or they cannot well handle the issue of variable size output vocabulary. We propose a generic end-to-end segmentation model called Seg-Bot. SegBot uses a bidirectional recurrent neural network to encode input text sequence. The model then uses another recurrent neural network together with a pointer network to select text boundaries in the input sequence. In this way, SegBot does not require hand-crafted features. More importantly, our model inherently handles the issue of variable size output vocabulary and the issue of sparse boundary tags. In our experiments, SegBot outperforms state-of-the-art models on both topic and EDU segmentation tasks.

Cite

CITATION STYLE

APA

Li, J., Sun, A., & Joty, S. (2018). SEGBOT: A generic neural text segmentation model with pointer network. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 4166–4172). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/579

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free