Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, word embedding techniques that assign a multidimensional vector to each word in a given corpus are often used in various tasks in Natural Language Processing. Although most of existing methods such as word2vec assign a single vector to each word, some advanced ones assign a multisense word with multiple vectors corresponding to individual meanings it has. However, unfortunately, it is difficult to properly evaluate those word vectors assigned to multisense words by using publicly available word similarity datasets. Thus, in this paper, we propose a novel dataset and a corresponding evaluation metric that enable us to evaluate such word vectors learned considering multisense words. The proposed dataset consists of synsets in WordNet and BabelNet that are well-known lexical databases, instead of using individual words, and incorporates the distance between synsets in the concept hierarchies of WordNet and BabelNet to evaluate the similarity between word vectors. We empirically show that the proposed dataset and evaluation metric allow us to evaluate word vectors for multisense words more properly than metrics for an existing dataset.

Cite

CITATION STYLE

APA

Yamazaki, T., Toyota, T., & Ohara, K. (2019). Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11669 LNAI, pp. 81–96). Springer Verlag. https://doi.org/10.1007/978-3-030-30639-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free