This paper presents a method of automatic construction extraction from a large corpus of Russian. The term 'construction' here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, a glass of [water/juice/milk]. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns denoting human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used to build a Russian construction dictionary, accelerate theoretical studies of constructions as well as facilitate teaching Russian as a foreign language.
CITATION STYLE
Kutuzov, A., Kuzmenko, E., & Pivovarova, L. (2017). Clustering of Russian adjective-noun constructions using word embeddings. In BSNLP 2017 - 6th Workshop on Balto-Slavic Natural Language Processing at the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 (pp. 3–13). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-1402
Mendeley helps you to discover research relevant for your work.