Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

Daniel Zeman

Conference Proceedings

Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

Zeman D

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6231 LNAI 216-223

DOI: 10.1007/978-3-642-15760-8_28

1Citations

3Readers

Get full text

Abstract

We present a systematic comparison of preprocessing techniques for two language pairs: English-Czech and English-Hindi. The two target languages, although both belonging to the Indo-European language family, show significant differences in morphology, syntax and word order. We describe how TectoMT, a successful framework for analysis and generation of language, can be used as preprocessor for a phrase-based MT system.We compare the two language pairs and the optimal sets of source-language transformations applied to them. The following transformations are examples of possible preprocessing steps: lemmatization; retokenization, compound splitting; removing/adding words lacking counterparts in the other language; phrase reordering to resemble the target word order; marking syntactic functions. TectoMT, as well as all other tools and data sets we use, are freely available on the Web. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Zeman, D. (2010). Using TectoMT as a preprocessing tool for phrase-based statistical machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6231 LNAI, pp. 216–223). https://doi.org/10.1007/978-3-642-15760-8_28

Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

Abstract

Author supplied keywords

Cite

Register to see more suggestions