Only a few studies have made use of alignment information in bilingual lexicon extraction from comparable corpora, in which comparable corpora are necessarily divided into 1-1 aligned document pairs. They have not been able to show extracted lexicons benefit from the embedding of alignment information. Moreover, strict 1-1 alignments do not exist broadly in comparable corpora. We develop in this paper a language-independent approach to lexicon extraction by combining the classic lexical context with pseudo-alignment information. Experiments on the English-French comparable corpus demonstrate that pseudo-alignment in comparable corpora is an essential feature leading to a significant improvement of standard method of lexicon extraction, a perspective that have never been investigated in a similar way by previous studies.
CITATION STYLE
Li, B., Zhu, Q., He, T., & Chen, Q. (2014). Combining lexical context with pseudo-alignment for bilingual lexicon extraction from comparable corpora. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8801, 223–233. https://doi.org/10.1007/978-3-319-12277-9_20
Mendeley helps you to discover research relevant for your work.