Extracting clusters of specialist terms from unstructured text

5Citations
Citations of this article
83Readers
Mendeley users who have this article in their library.

Abstract

Automatically identifying related specialist terms is a difficult and important task required to understand the lexical structure of language. This paper develops a corpus-based method of extracting coherent clusters of satellite terminology - Terms on the edge of the lexicon - using co-occurrence networks of unstructured text. Term clusters are identified by extracting communities in the cooccurrence graph, after which the largest is discarded and the remaining words are ranked by centrality within a community. The method is tractable on large corpora, requires no document structure and minimal normalization. The results suggest that the model is able to extract coherent groups of satellite terms in corpora with varying size, content and structure. The findings also confirm that language consists of a densely connected core (observed in dictionaries) and systematic, semantically coherent groups of terms at the edges of the lexicon.

Cite

CITATION STYLE

APA

Gerow, A. (2014). Extracting clusters of specialist terms from unstructured text. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1426–1434). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1149

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free