Segmentation of bilingual text corpora is a very important issue to deal with in machine translation. In this paper we present a new method to perform bilingual segmentation of a parallel corpus, SPBalign, which is based on phrase-based statistical translation models. The new technique proposed here is compared with other two existing techniques, which are also based on statistical translation methods: the RECalign technique, which is based on the concept of recursive alignment, and the GIATIalign technique, which is based on simple word alignments. Experimental results are obtained for the EuTRANS-I English-Spanish task, in order to create new, shorter bilingual segments to be included in a translation memory database. The evaluation of these three methods has been performed comparing the bilingual segmentations obtained by these techniques with respect to a manually segmented bilingual test corpus. These results show us that the new method proposed here outperforms in all cases the two already proposed bilingual segmentation techniques. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Varea, I. G., Ortiz, D., Nevado, F., Gómez, P. A., & Casacuberta, F. (2005). Automatic segmentation of bilingual corpora: A comparison of different techniques. In Lecture Notes in Computer Science (Vol. 3523, pp. 614–621). https://doi.org/10.1007/11492542_75
Mendeley helps you to discover research relevant for your work.