Simulating CLIR translation resource scarcity using high-resource languages

8Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

We study the impact of translation resource scarcity on the performance of cross-language information retrieval (CLIR) systems. To do that, we develop a contrastive analysis framework that uses high-resource languages to simulate low-resource languages. In the framework, we focus on parallel translation corpora and aim to better understand the factors that impact CLIR performance. We argue that both low- and high-resource corpora are needed to develop that understanding. Hence, we take the approach of starting with a true low-resource language and systematically downsampling a high-resource language to become an artificial lowresource language-the reverse perspective of existing research. We formalize the problem as the Resource Scarcity Simulation (RSS) problem. We model the problem with a family of set covering problems, formulate with integer linear programming, and prove that the problem is actually NP-hard. To this end, we provide two greedy algorithms with polynomial complexities.We compare and analyze our approach with alternate techniques using four high-resource languages (French, Italian, German, and Finnish) down-sampled to simulate two low-resource languages (Somali and Swahili). Our experimental results suggest that language families are important for the RSS problem.We simulate Somali with German, and Swahili with Finnish, achieving 98% and 97% on the similarity percentage in terms of CLIR performance, respectively.

Cite

CITATION STYLE

APA

Bonab, H., Allan, J., & Sitaraman, R. (2019). Simulating CLIR translation resource scarcity using high-resource languages. In ICTIR 2019 - Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (pp. 129–136). Association for Computing Machinery, Inc. https://doi.org/10.1145/3341981.3344236

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free