A probabilistic model based on n-grams for bilingual word sense disambiguation

Darnes Vilariño; David Pinto; Mireya Tovar; Carlos Balderas; Beatriz Beltrán

Conference Proceedings

A probabilistic model based on n-grams for bilingual word sense disambiguation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6437 LNAI(PART 1) 82-91

DOI: 10.1007/978-3-642-16761-4_8

0Citations

2Readers

Get full text

Abstract

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing. Even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a source language), in order to find the correct sense (in the target language) of the source word. In this paper we propose a model based on n-grams (3-grams and 5-grams) that significantly outperforms the last results that we presented at the cross-lingual word sense disambiguation task at the SemEval-2 forum. We use a naïve Bayes classifier for determining the probability of a target sense (in a target language) given a sentence which contains the ambiguous word (in a source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to determine the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). As we mentioned, the results were compared with those of an international competition, obtaining a good performance. © 2010 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Vilariño, D., Pinto, D., Tovar, M., Balderas, C., & Beltrán, B. (2010). A probabilistic model based on n-grams for bilingual word sense disambiguation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6437 LNAI, pp. 82–91). https://doi.org/10.1007/978-3-642-16761-4_8

A probabilistic model based on n-grams for bilingual word sense disambiguation

Abstract

Author supplied keywords

Cite

Register to see more suggestions