Unsupervised learning of field segmentation models for information extraction

Trond Grenager; Dan Klein; Christopher D. Manning

Conference Proceedings

Unsupervised learning of field segmentation models for information extraction

ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2005) 371-378

DOI: 10.3115/1219840.1219886

59Citations

147Readers

Get full text

Abstract

The applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, such as classified advertisements and bibliographic citations, small amounts of prior knowledge can be used to learn effective models in a primarily unsupervised fashion. Although hidden Markov models (HMMs) provide a suitable generative model for field structured text, general unsupervised HMM learning fails to learn useful structure in either of our domains. However, one can dramatically improve the quality of the learned structure by exploiting simple prior knowledge of the desired solutions. In both domains, we found that unsupervised methods can attain accuracies with 400 unlabeled examples comparable to those attained by supervised methods on 50 labeled examples, and that semi-supervised methods can make good use of small amounts of labeled data. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Grenager, T., Klein, D., & Manning, C. D. (2005). Unsupervised learning of field segmentation models for information extraction. In ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 371–378). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1219840.1219886

Unsupervised learning of field segmentation models for information extraction

Abstract

Cite

Register to see more suggestions