We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The supervised component is Collins' parser, trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. We find that the combined system only improves the performance of the parser for small training sets. Surprisingly, the size of the unannotated corpus has little effect due to the noisiness of the lexical statistics acquired by unsupervised learning.
CITATION STYLE
Atterer, M., & Schütze, H. (2006). The effect of corpus size in combining supervised and unsupervised training for disambiguation. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (pp. 25–32). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1273073.1273077
Mendeley helps you to discover research relevant for your work.