Let’s play mono-poly: Bert can reveal words’ polysemy level and partitionability into senses

58Citations
Citations of this article
76Readers
Mendeley users who have this article in their library.

Abstract

Pre-trained language models (LMs) encode rich information about linguistic structure but their knowledge about lexical polysemy remains unclear. We propose a novel experimental setup for analyzing this knowledge in LMs specifically trained for different languages (English, French, Spanish, and Greek) and in multilingual BERT. We perform our analysis on datasets carefully designed to reflect different sense distributions, and control for parameters that are highly correlated with polysemy such as frequency and grammatical category. We demonstrate that BERT-derived representations reflect words’ polysemy level and their partitionability into senses. Polysemy-related information is more clearly present in English BERT embeddings, but models in other languages also manage to establish relevant distinctions between words at different polysemy levels. Our results contribute to a better understanding of the knowledge encoded in contextualized representations and open up new avenues for multilingual lexical semantics research.

Cite

CITATION STYLE

APA

Soler, A. G., & Apidianaki, M. (2021). Let’s play mono-poly: Bert can reveal words’ polysemy level and partitionability into senses. Transactions of the Association for Computational Linguistics, 9, 825–844. https://doi.org/10.1162/tacl_a_00400

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free