Abstract
We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lexical information from a treebank PCFG. The procedures produce substantial improvements (up to 31.6% error reduction) on the task of determining subcategorization frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantities of unlabeled training data, the re-estimated models show promising improvements in labeled bracketing f-scores on Wall Street Journal parsing, and substantial benefit in acquiring the subcategorization preferences of low-frequency verbs. © 2008 Licensed under the Creative Commons.
Cite
CITATION STYLE
Deoskar, T. (2008). Re-estimation of lexical parameters for treebank PCFGs. In Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 193–200). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1599081.1599106
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.