Automatic segmentation of bilingual corpora: A comparison of different techniques

Ismael García Varea; Daniel Ortiz; Francisco Nevado; Pedro A. Gómez; Francisco Casacuberta

Conference Proceedings

Automatic segmentation of bilingual corpora: A comparison of different techniques

Lecture Notes in Computer Science (2005) 3523(II) 614-621

DOI: 10.1007/11492542_75

4Citations

2Readers

Get full text

Abstract

Segmentation of bilingual text corpora is a very important issue to deal with in machine translation. In this paper we present a new method to perform bilingual segmentation of a parallel corpus, SPBalign, which is based on phrase-based statistical translation models. The new technique proposed here is compared with other two existing techniques, which are also based on statistical translation methods: the RECalign technique, which is based on the concept of recursive alignment, and the GIATIalign technique, which is based on simple word alignments. Experimental results are obtained for the EuTRANS-I English-Spanish task, in order to create new, shorter bilingual segments to be included in a translation memory database. The evaluation of these three methods has been performed comparing the bilingual segmentations obtained by these techniques with respect to a manually segmented bilingual test corpus. These results show us that the new method proposed here outperforms in all cases the two already proposed bilingual segmentation techniques. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Varea, I. G., Ortiz, D., Nevado, F., Gómez, P. A., & Casacuberta, F. (2005). Automatic segmentation of bilingual corpora: A comparison of different techniques. In Lecture Notes in Computer Science (Vol. 3523, pp. 614–621). https://doi.org/10.1007/11492542_75

Automatic segmentation of bilingual corpora: A comparison of different techniques

Abstract

Cite

Register to see more suggestions