Exploiting comparable corpora to enhance bilingual lexicon induction from monolingual corpora

Rizka Wakhidatus Sholikah; Yasuhiko Morimoto; Agus Zainal Arifin; Chastine Fatichah; Ayu Purwarianti

Journal ArticleOPEN ACCESS

Exploiting comparable corpora to enhance bilingual lexicon induction from monolingual corpora

International Journal of Intelligent Engineering and Systems (2020) 13(5) 379-391

DOI: 10.22266/ijies2020.1031.34

0Citations

7Readers

Abstract

Bilingual lexicons are essential resources in natural language processing (NLP) and information retrieval (IR). Automatic bilingual lexicon acquisition relies on a large number of parallel corpora that can be scarce or even unavailable for several languages. On the other hand, there are other resources that can be used to build bilingual lexicon such as comparable corpora (aligned documents) and monolingual corpora that are easily to get and available in any language, including resource-limited languages. Hence, this paper proposes a two stages framework that can learn bilingual lexicons from monolingual corpora enhanced using comparable corpora without any additional resources. The framework consists of two stages: comparable dictionary building and monolingual mapping. Comparable dictionary building is a process to create coarse dictionary from comparable corpora by utilizing topic modeling approach. The second stage is monolingual mapping by using the result from the previous stage as seed initialization for the bi-directional projection learning. The utilization of comparable corpora can replace the need of bilingual dictionary. The experiment was conducted using three kinds of language pairs: English-®Indonesia, English-®Arabic and Arabic-®Indonesia. The result of the experiment showed that the proposed method can enhance the accuracy from monolingual corpora and outperform other previous methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Sholikah, R. W., Morimoto, Y., Arifin, A. Z., Fatichah, C., & Purwarianti, A. (2020). Exploiting comparable corpora to enhance bilingual lexicon induction from monolingual corpora. International Journal of Intelligent Engineering and Systems, 13(5), 379–391. https://doi.org/10.22266/ijies2020.1031.34

Exploiting comparable corpora to enhance bilingual lexicon induction from monolingual corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions