Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora

Dekai Wu; Pascale Fung

Conference Proceedings

Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3651 LNAI 257-268

DOI: 10.1007/11562214_23

29Citations

72Readers

Get full text

Abstract

We present a new implication of Wu's (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wu, D., & Fung, P. (2005). Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3651 LNAI, pp. 257–268). Springer Verlag. https://doi.org/10.1007/11562214_23

Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora

Abstract

Cite

Register to see more suggestions