We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes. © 2008. Licensed under the Creative Commons.
CITATION STYLE
Reichart, R., & Rappoport, A. (2008). Unsupervised induction of labeled parse trees by clustering with syntactic features. In Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 721–728). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1599081.1599172
Mendeley helps you to discover research relevant for your work.