Abstract
We describe a multi-step process for automatically learning reliable sub-sentential syntactic phrases that are translation equivalents of each other and syntactic translation rules between two languages. The input to the process is a corpus of parallel sentences, word-aligned and annotated with phrase-structure parse trees. We first apply a newly developed algorithm for aligning parse-tree nodes between the two parallel trees. Next, we extract all aligned sub-sentential syntactic constituents from the parallel sentences, and create a syntax-based phrase-table. Finally, we treat the node alignments as tree decomposition points and extract from the corpus all possible synchronous parallel tree fragments. These are then converted into synchronous context-free rules. We describe the approach and analyze its application to Chinese-English parallel data.
Cite
CITATION STYLE
Lavie, A., Parlikar, A., & Ambati, V. (2008). Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora. In Proceedings of SSST 2008 - 2nd Workshop on Syntax and Structure in Statistical Translation (pp. 87–95). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626269.1626280
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.