Distributional semantic models represent the meaning of words as vectors. We introduce a selec-tion method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset, which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main ad-vantage of the method compared to fusion methods such as NMF, and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We re-port tests results on word similarity tasks for MEN, RG-65, SimLex-999, and WordSim353 gold datasets. Also, results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So, we achieve good interpretability without a large penalty. Interpreta-bility evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models, and the baseline method. We report the top 15 words of 1500 selected basis words in this paper.
CITATION STYLE
Pakzad, A., & Analoui, M. (2021). A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors. Journal of Artificial Intelligence Research, 72, 1281–1305. https://doi.org/10.1613/JAIR.1.13353
Mendeley helps you to discover research relevant for your work.