Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the Expectation-Maximization algorithm (Dempster et al., 1977). Seeking to avoid search error, DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward; significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant, skewed DA, which can incorporate a good initializer when it is available, and show significant improvements over EM on a grammar induction task.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Smith, N. A., & Eisner, J. (2004). Annealing techniques for unsupervised statistical language learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 486–493). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1218955.1219017