Do scaling algorithms preserve word2Vec Semantics? A case study for medical entities

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The exponential increase of scientific publications in the bio-medical field challenges access to scientific information, which primarily is encoded by semantic relationships between medical entities, such as active ingredients, diseases, or genes. Neural language models, such as Word2Vec, offer new ways of automatically learning semantically meaningful entity relationships even from large text corpora. They offer high scalability and deliver better accuracy than comparable approaches. Still, first the models have to be tuned by testing different training parameters. Arguably, the most critical parameter is the number of training dimensions for the neural network training and testing individually different numbers of dimensions is time-consuming. It usually takes hours or even days per training iteration on large corpora. In this paper we show a more efficient way to determine the optimal number of dimensions concerning quality measures such as precision/recall. We show that the quality of results gained using simpler and easier to compute scaling approaches like MDS or PCA correlates strongly with the expected quality when using the same number of Word2Vec training dimensions. This has even more impact if after initial Word2Vec training only a limited number of entities and their respective relations are of interest.

Cite

CITATION STYLE

APA

Wawrzinek, J., Pinto, J. M. G., Markiewka, P., & Balke, W. T. (2019). Do scaling algorithms preserve word2Vec Semantics? A case study for medical entities. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11371 LNBI, pp. 3–16). Springer Verlag. https://doi.org/10.1007/978-3-030-06016-9_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free