Corpus conversion and grammar extraction have traditionally been portrayed as tasks that are performed once and never again revisited (Burke et al., 2004). We report the successful implementation of an approach to these tasks that facilitates the improvement of grammar engineering as an evolving process. Taking the standard version of the CCGbank (Hockenmaier and Steedman, 2007) as input, our system then introduces greater depth of linguistic insight by augmenting it with attributes the original corpus lacks: Propbank roles and head lexicalization for case-marking prepositions (Boxwell and White, 2008), derivational re-structuring for punctuation analysis (White and Rajkumar, 2008), named entity annotation and lemmatization. Our implementation applies successive XSLT transforms controlled by Apache Ant (http://ant.apache.org/) to an XML translation of this corpus, finally producing an OpenCCG grammar (http://openccg.sourceforge.net/). This design is beneficial to grammar engineering both because of XSLT's unique suitability to performing arbitrary transformations of XML trees and the fine-grained control that Ant provides. The resulting system enables state-of-the-art BLEU scores for surface realization on section 23 of the CCGbank.
CITATION STYLE
Martin, S., Rajkumar, R., & White, M. (2009). Grammar Engineering for CCG using Ant and XSLT∗. In NAACL HLT 2009 - Software Engineering, Testing, and Quality Assurance for Natural Language Processing, SETQA-NLP 2009 - Proceedings of the Workshop (pp. 45–46). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1621947.1621955
Mendeley helps you to discover research relevant for your work.