Flexible text segmentation with structured multilabel classification

61Citations
Citations of this article
128Readers
Mendeley users who have this article in their library.

Abstract

Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguous segments. We evaluate the model on entity extraction and noun-phrase chunking and show that it is more accurate for overlapping and non-contiguous segments, but it still performs well on simpler data sets for which sequential tagging has been the best method. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

McDonald, R., Crammer, K., & Pereira, F. (2005). Flexible text segmentation with structured multilabel classification. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 987–994). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220699

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free