Using statistical translation models for bilingual IR

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This report describes our test on using statistical translation models for bilingual IR tasks in CLEF-2001. These translation models have been trained on a set of parallel web pages automatically mined from the Web. Our goal is to compare the following approaches: - using the original parallel corpora or a cleaned corpora to train translation models; - using the raw translation probabilities to weigh query words or combine the probabilities with IDF; - using different cut-off probability values in the translation models (i.e. delete the translations lower than a threshold). Our results show that: - the models trained on the original parallel corpus work better than those on the cleaned corpora; - the combination of the probabilities with IDF is beneficial; - and it is better to cut-off the translation models at a certain value (0.01 in our case) than not cut them.

Cite

CITATION STYLE

APA

Nie, J. Y., & Simard, M. (2001). Using statistical translation models for bilingual IR. In CEUR Workshop Proceedings (Vol. 1167). CEUR-WS. https://doi.org/10.1007/3-540-45691-0_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free