A statistical approach to term extraction

Rogelio Nazar

Journal ArticleOPEN ACCESS

A statistical approach to term extraction

Nazar R

International Journal of English Studies (2011) 11(2) 159

DOI: 10.6018/ijes/2011/2/149691

N/ACitations

13Readers

Abstract

This paper argues in favor of a statistical approach to terminology extraction, general to all languages but with language specific parameters. In contrast to many application-oriented terminology studies, which are focused on a particular language and domain, this paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters. This method is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms. The proposal is theoretically and computationally simple and disregards resources such as linguistic or ontological knowledge. The algorithm learns to identify terms during a training phase where it is shown examples of both terminological and non-terminological units. With these examples, the algorithm creates a model of the terminology that accounts for the frequency of lexical, morphological and syntactic elements of the terms in relation to the non-terminological vocabulary. The model is then used for the later identification of new terminology in previously unseen text. The comparative evaluation shows that performance is significantly higher than other well-known systems.

Cite

CITATION STYLE

APA

Nazar, R. (2011). A statistical approach to term extraction. International Journal of English Studies, 11(2), 159. https://doi.org/10.6018/ijes/2011/2/149691

A statistical approach to term extraction

Abstract

Cite

Register to see more suggestions