Paradigmatic modi-ability statistics for the extraction of complex multi-word terms

29Citations
Citations of this article
94Readers
Mendeley users who have this article in their library.

Abstract

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm significantly outperforms a variety of standard term identification measures. We also provide empirical evidence that our methodolgy is essentially domain- and corpus-size-independent. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Wermter, J., & Hahn, U. (2005). Paradigmatic modi-ability statistics for the extraction of complex multi-word terms. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 843–850). https://doi.org/10.3115/1220575.1220681

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free