A statistical approach to term extraction

  • Nazar R
N/ACitations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

This paper argues in favor of a statistical approach to terminology extraction, general to all languages but with language specific parameters. In contrast to many application-oriented terminology studies, which are focused on a particular language and domain, this paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters. This method is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms. The proposal is theoretically and computationally simple and disregards resources such as linguistic or ontological knowledge. The algorithm learns to identify terms during a training phase where it is shown examples of both terminological and non-terminological units. With these examples, the algorithm creates a model of the terminology that accounts for the frequency of lexical, morphological and syntactic elements of the terms in relation to the non-terminological vocabulary. The model is then used for the later identification of new terminology in previously unseen text. The comparative evaluation shows that performance is significantly higher than other well-known systems.

Cite

CITATION STYLE

APA

Nazar, R. (2011). A statistical approach to term extraction. International Journal of English Studies, 11(2), 159. https://doi.org/10.6018/ijes/2011/2/149691

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free