Predicting Concreteness and Imageability of Words within and across Languages via Word Embeddings

Nikola Ljubešić; Darja Fišer; Anita Peti-Stantić

Conference ProceedingsOPEN ACCESS

Predicting Concreteness and Imageability of Words within and across Languages via Word Embeddings

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2018) 217-222

DOI: 10.18653/v1/w18-3028

27Citations

97Readers

Abstract

The notions of concreteness and imageability, traditionally important in psycholinguistics, are gaining significance in semantic-oriented natural language processing tasks. In this paper we investigate the predictability of these two concepts via supervised learning, using word embeddings as explanatory variables. We perform predictions both within and across languages by exploiting collections of cross-lingual embeddings aligned to a single vector space. We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20% in correlation when predicting across languages. We further show that the cross-lingual transfer via word embeddings is more efficient than the simple transfer via bilingual dictionaries.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Ljubešić, N., Fišer, D., & Peti-Stantić, A. (2018). Predicting Concreteness and Imageability of Words within and across Languages via Word Embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 217–222). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-3028

Readers' Seniority

PhD / Post grad / Masters / Doc 26

67%

Researcher 9

23%

Professor / Associate Prof. 2

Lecturer / Post doc 2

Readers' Discipline

Computer Science 29

64%

Linguistics 9

20%

Psychology 5

11%

Social Sciences 2

Predicting Concreteness and Imageability of Words within and across Languages via Word Embeddings

Abstract

References Powered by Scopus

Concreteness ratings for 40 thousand generally known English word lemmas

MRC psycholinguistic database: Machine-usable dictionary, version 2.00

Inducing domain-specific sentiment lexicons from unlabeled corpora

Cited by Powered by Scopus

Predicting word concreteness and imagery

Estimating the imageability of words by mining visual characteristics from crawled image data

The Croatian psycholinguistic database: Estimates for 6000 nouns, verbs, adjectives and adverbs

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline