Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words

Tomoaki Yamazaki; Tetsuya Toyota; Kouzou Ohara

Conference Proceedings

Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11669 LNAI 81-96

DOI: 10.1007/978-3-030-30639-7_8

0Citations

2Readers

Get full text

Abstract

Recently, word embedding techniques that assign a multidimensional vector to each word in a given corpus are often used in various tasks in Natural Language Processing. Although most of existing methods such as word2vec assign a single vector to each word, some advanced ones assign a multisense word with multiple vectors corresponding to individual meanings it has. However, unfortunately, it is difficult to properly evaluate those word vectors assigned to multisense words by using publicly available word similarity datasets. Thus, in this paper, we propose a novel dataset and a corresponding evaluation metric that enable us to evaluate such word vectors learned considering multisense words. The proposed dataset consists of synsets in WordNet and BabelNet that are well-known lexical databases, instead of using individual words, and incorporates the distance between synsets in the concept hierarchies of WordNet and BabelNet to evaluate the similarity between word vectors. We empirically show that the proposed dataset and evaluation metric allow us to evaluate word vectors for multisense words more properly than metrics for an existing dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Yamazaki, T., Toyota, T., & Ohara, K. (2019). Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11669 LNAI, pp. 81–96). Springer Verlag. https://doi.org/10.1007/978-3-030-30639-7_8

Constructing Dataset Based on Concept Hierarchy for Evaluating Word Vectors Learned from Multisense Words

Abstract

Author supplied keywords

Cite

Register to see more suggestions