Learning accurate, compact, and interpretable tree annotation

Slav Petrov; Leon Barrett; Romain Thibaux; Dan Klein

Conference Proceedings

Learning accurate, compact, and interpretable tree annotation

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2006) 1 433-440

DOI: 10.3115/1220175.1220230

609Citations

316Readers

Get full text

Abstract

We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 433–440). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220230

Learning accurate, compact, and interpretable tree annotation

Abstract

Cite

Register to see more suggestions