Automatic segmentation of bilingual corpora: A comparison of different techniques

4Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Segmentation of bilingual text corpora is a very important issue to deal with in machine translation. In this paper we present a new method to perform bilingual segmentation of a parallel corpus, SPBalign, which is based on phrase-based statistical translation models. The new technique proposed here is compared with other two existing techniques, which are also based on statistical translation methods: the RECalign technique, which is based on the concept of recursive alignment, and the GIATIalign technique, which is based on simple word alignments. Experimental results are obtained for the EuTRANS-I English-Spanish task, in order to create new, shorter bilingual segments to be included in a translation memory database. The evaluation of these three methods has been performed comparing the bilingual segmentations obtained by these techniques with respect to a manually segmented bilingual test corpus. These results show us that the new method proposed here outperforms in all cases the two already proposed bilingual segmentation techniques. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Varea, I. G., Ortiz, D., Nevado, F., Gómez, P. A., & Casacuberta, F. (2005). Automatic segmentation of bilingual corpora: A comparison of different techniques. In Lecture Notes in Computer Science (Vol. 3523, pp. 614–621). https://doi.org/10.1007/11492542_75

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free