Using statistical translation models for bilingual IR

Jian Yun Nie; Michel Simard

Conference Proceedings

Using statistical translation models for bilingual IR

CEUR Workshop Proceedings (2001) 1167

DOI: 10.1007/3-540-45691-0_11

1Citations

6Readers

Get full text

Abstract

This report describes our test on using statistical translation models for bilingual IR tasks in CLEF-2001. These translation models have been trained on a set of parallel web pages automatically mined from the Web. Our goal is to compare the following approaches: - using the original parallel corpora or a cleaned corpora to train translation models; - using the raw translation probabilities to weigh query words or combine the probabilities with IDF; - using different cut-off probability values in the translation models (i.e. delete the translations lower than a threshold). Our results show that: - the models trained on the original parallel corpus work better than those on the cleaned corpora; - the combination of the probabilities with IDF is beneficial; - and it is better to cut-off the translation models at a certain value (0.01 in our case) than not cut them.

Cite

CITATION STYLE

APA

Nie, J. Y., & Simard, M. (2001). Using statistical translation models for bilingual IR. In CEUR Workshop Proceedings (Vol. 1167). CEUR-WS. https://doi.org/10.1007/3-540-45691-0_11

Using statistical translation models for bilingual IR

Abstract

Cite

Register to see more suggestions