Clustering of Russian adjective-noun constructions using word embeddings

9Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a method of automatic construction extraction from a large corpus of Russian. The term 'construction' here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, a glass of [water/juice/milk]. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns denoting human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used to build a Russian construction dictionary, accelerate theoretical studies of constructions as well as facilitate teaching Russian as a foreign language.

Cite

CITATION STYLE

APA

Kutuzov, A., Kuzmenko, E., & Pivovarova, L. (2017). Clustering of Russian adjective-noun constructions using word embeddings. In BSNLP 2017 - 6th Workshop on Balto-Slavic Natural Language Processing at the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 (pp. 3–13). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-1402

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free