An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

56Citations
Citations of this article
118Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine (Kilgarriff et al., 2004). Another innovative development in the same tool is the presentation of the grammatical behaviour of a word against the background of how all other words of the same word class behave. Thus, the English noun constraint occurs 75% in the plural. Is this a salient lexical fact? To form a judge-ment, we need to know the distribution for all nouns. We use histograms to present the distribution in a way that is easy to grasp.

Cite

CITATION STYLE

APA

Rychlý, P., & Kilgarriff, A. (2007). An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments). In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 41–44). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1557769.1557783

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free