A statistical view on bilingual lexicon extraction

Pascale Fung

Book Chapter

A statistical view on bilingual lexicon extraction

Fung P

DOI: 10.1007/978-94-017-2535-4_11

N/ACitations

32Readers

Get full text

Abstract

We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35{\%} precision from a small corpus and 89.93{\%} precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from non-parallel corpora. We present a first such result in this area, from a new method-Convec. Convec is based on context information of a word to be translated.

Cite

CITATION STYLE

APA

Fung, P. (2000). A statistical view on bilingual lexicon extraction (pp. 219–236). https://doi.org/10.1007/978-94-017-2535-4_11

A statistical view on bilingual lexicon extraction

Abstract

Cite

Register to see more suggestions