Computing semantic similarity using large static corpora

András Dobó; János Csirik

Conference Proceedings

Computing semantic similarity using large static corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7741 LNCS 491-502

DOI: 10.1007/978-3-642-35843-2_42

5Citations

4Readers

Get full text

Abstract

Measuring semantic similarity of words is of crucial importance in Natural Language Processing. Although there are many different approaches for this task, there is still room for improvement. In contrast to many other methods that use web search engines or large lexical databases, we developed such methods that solely rely on large static corpora. They create a binary or numerical feature vector for each word making use of statistical information obtained from the corpora. These vectors contain features based on context words or grammatical relations extracted from the corpora and they employ diverse weighting schemes. After creating the feature vectors, word similarity is calculated using various vector similarity measures. Beside the individual methods, their combinations were also tested. Evaluated on both the Miller-Charles dataset and the TOEFL synonym questions, they achieve competitive results to recent methods. © 2013 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Dobó, A., & Csirik, J. (2013). Computing semantic similarity using large static corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7741 LNCS, pp. 491–502). https://doi.org/10.1007/978-3-642-35843-2_42

Computing semantic similarity using large static corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions