A statistical view on bilingual lexicon extraction

  • Fung P
N/ACitations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35{\%} precision from a small corpus and 89.93{\%} precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from non-parallel corpora. We present a first such result in this area, from a new method-Convec. Convec is based on context information of a word to be translated.

Cite

CITATION STYLE

APA

Fung, P. (2000). A statistical view on bilingual lexicon extraction (pp. 219–236). https://doi.org/10.1007/978-94-017-2535-4_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free