Abstract
This paper argues in favor of a statistical approach to terminology extraction, general to all languages but with language specific parameters. In contrast to many application-oriented terminology studies, which are focused on a particular language and domain, this paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters. This method is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms. The proposal is theoretically and computationally simple and disregards resources such as linguistic or ontological knowledge. The algorithm learns to identify terms during a training phase where it is shown examples of both terminological and non-terminological units. With these examples, the algorithm creates a model of the terminology that accounts for the frequency of lexical, morphological and syntactic elements of the terms in relation to the non-terminological vocabulary. The model is then used for the later identification of new terminology in previously unseen text. The comparative evaluation shows that performance is significantly higher than other well-known systems.
Cite
CITATION STYLE
Nazar, R. (2011). A statistical approach to term extraction. International Journal of English Studies, 11(2), 159. https://doi.org/10.6018/ijes/2011/2/149691
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.