Word Representations Concentrate and This is Good News!

Romain Couillet; Yagmur Gizem Cinar; Eric Gaussier; Muhammad Imran

Conference ProceedingsOPEN ACCESS

Word Representations Concentrate and This is Good News!

CoNLL 2020 - 24th Conference on Computational Natural Language Learning, Proceedings of the Conference (2020) 325-334

DOI: 10.18653/v1/2020.conll-1.25

2Citations

67Readers

Abstract

This article establishes that, unlike the legacy tf*idf representation, recent natural language representations (word embedding vectors) tend to exhibit a so-called concentration of measure phenomenon, in the sense that, as the representation size p and database size n are both large, their behavior is similar to that of large dimensional Gaussian random vectors. This phenomenon may have important consequences as machine learning algorithms for natural language data could be amenable to improvement, thereby providing new theoretical insights into the field of natural language processing.

Cite

CITATION STYLE

APA

Couillet, R., Cinar, Y. G., Gaussier, E., & Imran, M. (2020). Word Representations Concentrate and This is Good News! In CoNLL 2020 - 24th Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 325–334). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.conll-1.25

Word Representations Concentrate and This is Good News!

Abstract

Cite

Register to see more suggestions