A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

András Dobó; János Csirik

Journal ArticleOPEN ACCESS

A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

Journal of Quantitative Linguistics (2020) 27(3) 244-271

DOI: 10.1080/09296174.2019.1570897

3Citations

9Readers

Abstract

Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

Cite

CITATION STYLE

APA

Dobó, A., & Csirik, J. (2020). A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models. Journal of Quantitative Linguistics, 27(3), 244–271. https://doi.org/10.1080/09296174.2019.1570897

A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

Abstract

Cite

Register to see more suggestions