Computing semantic similarity using large static corpora

5Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Measuring semantic similarity of words is of crucial importance in Natural Language Processing. Although there are many different approaches for this task, there is still room for improvement. In contrast to many other methods that use web search engines or large lexical databases, we developed such methods that solely rely on large static corpora. They create a binary or numerical feature vector for each word making use of statistical information obtained from the corpora. These vectors contain features based on context words or grammatical relations extracted from the corpora and they employ diverse weighting schemes. After creating the feature vectors, word similarity is calculated using various vector similarity measures. Beside the individual methods, their combinations were also tested. Evaluated on both the Miller-Charles dataset and the TOEFL synonym questions, they achieve competitive results to recent methods. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Dobó, A., & Csirik, J. (2013). Computing semantic similarity using large static corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7741 LNCS, pp. 491–502). https://doi.org/10.1007/978-3-642-35843-2_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free