Parallel texts extraction from multimodal comparable corpora

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Statistical machine translation (SMT) systems depend on the availability of domain-specific bilingual parallel text. However parallel corpora are a limited resource and they are often not available for some domains or language pairs. We analyze the feasibility of extracting parallel sentences from multimodal comparable corpora. This work extends the use of comparable corpora by using audio sources instead of texts on the source side. The audio is transcribed by an automatic speech recognition system and translated with a baseline SMT system. We then use information retrieval in a large text corpus in the target language to extract parallel sentences. We have performed a series of experiments on data of the IWSLT'11 speech translation task that shows the feasibility of our approach. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Afli, H., Barrault, L., & Schwenk, H. (2012). Parallel texts extraction from multimodal comparable corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7614 LNAI, pp. 41–51). https://doi.org/10.1007/978-3-642-33983-7_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free