This report describes our test on using statistical translation models for bilingual IR tasks in CLEF-2001. These translation models have been trained on a set of parallel web pages automatically mined from the Web. Our goal is to compare the following approaches: - using the original parallel corpora or a cleaned corpora to train translation models; - using the raw translation probabilities to weigh query words or combine the probabilities with IDF; - using different cut-off probability values in the translation models (i.e. delete the translations lower than a threshold). Our results show that: - the models trained on the original parallel corpus work better than those on the cleaned corpora; - the combination of the probabilities with IDF is beneficial; - and it is better to cut-off the translation models at a certain value (0.01 in our case) than not cut them.
CITATION STYLE
Nie, J. Y., & Simard, M. (2001). Using statistical translation models for bilingual IR. In CEUR Workshop Proceedings (Vol. 1167). CEUR-WS. https://doi.org/10.1007/3-540-45691-0_11
Mendeley helps you to discover research relevant for your work.