Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguous segments. We evaluate the model on entity extraction and noun-phrase chunking and show that it is more accurate for overlapping and non-contiguous segments, but it still performs well on simpler data sets for which sequential tagging has been the best method. © 2005 Association for Computational Linguistics.
CITATION STYLE
McDonald, R., Crammer, K., & Pereira, F. (2005). Flexible text segmentation with structured multilabel classification. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 987–994). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220699
Mendeley helps you to discover research relevant for your work.