Unsupervised grammar induction by distribution and attachment

David J. Brooks

Conference Proceedings

Unsupervised grammar induction by distribution and attachment

Brooks D

CoNLL 2006 - Proceedings of the 10th Conference on Computational Natural Language Learning (2006) 117-124

DOI: 10.3115/1596276.1596299

0Citations

86Readers

Get full text

Abstract

Distributional approaches to grammar induction are typically inefficient, enumerating large numbers of candidate constituents. In this paper, we describe a simplified model of distributional analysis which uses heuristics to reduce the number of candidate constituents under consideration. We apply this model to a large corpus of over 400000 words of written English, and evaluate the results using EVALB. We show that the performance of this approach is limited, providing a detailed analysis of learned structure and a comparison with actual constituent-context distributions. This motivates a more structured approach, using a process of attachment to form constituents from their distributional components. Our findings suggest that distributional methods do not generalize enough to learn syntax effectively from raw text, but that attachment methods are more successful.

Cite

CITATION STYLE

APA

Brooks, D. J. (2006). Unsupervised grammar induction by distribution and attachment. In CoNLL 2006 - Proceedings of the 10th Conference on Computational Natural Language Learning (pp. 117–124). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596276.1596299

Unsupervised grammar induction by distribution and attachment

Abstract

Cite

Register to see more suggestions