The challenges of optimizing machine translation for low resource cross-language information retrieval

7Citations
Citations of this article
87Readers
Mendeley users who have this article in their library.

Abstract

When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT). However, there is no established guidance on how to optimize the resulting MT-IR system. In this paper, we examine the relationship between the performance of MT systems and both neural and term frequency-based IR models to identify how CLIR performance can be best predicted from MT quality. We explore performance at varying amounts of MT training data, byte pair encoding (BPE) merge operations, and across two IR collections and retrieval models. We find that the choice of IR collection can substantially affect the predictive power of MT tuning decisions and evaluation, potentially introducing dissociations between MT-only and overall CLIR performance.

Cite

CITATION STYLE

APA

Lignos, C., Cohen, D., Lien, Y. C., Mehta, P., Bruce Croft, W., & Miller, S. (2019). The challenges of optimizing machine translation for low resource cross-language information retrieval. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 3497–3502). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1353

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free