A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora

Joaquim Ferreira

Journal Article

A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora

Ferreira J

Sixth Meeting on Mathematics of Language (1999) 369-381

N/ACitations

41Readers

Abstract

The availability of multi-word units (MWUs) in NLP lexica has important applications: enhances parsing precision, helps on attachment decision and enables more natural interaction of non-specialists users with information retrieval engines, among other applications. Most statistical approaches to MWUs extraction from corpora measure the association between two words, define thresholds for deciding which bigrams may be elected as possible units and use complex linguistic filters and language specific morpho-syntactic rules for filtering those units. In this paper we present: A new algorithm (LocalMaxs) for extracting complex units made up of 2 or more adjacent words (n-grams, with n "2). A new measure of "glue" or association between the words of any size n-gram. An exhaustive comparison of our association measure with other known measures (Loglike,"2, etc.). A new normalization, fair dispersion point normalisation, for current statistical measures (Loglike, "2, etc.) that enhances the precision and recall of the MWUs extracted by these measures.

Cite

CITATION STYLE

APA

Ferreira, J. (1999). A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora. Sixth Meeting on Mathematics of Language, 369–381. Retrieved from http://hlt.di.fct.unl.pt/jfs/MOL99.pdf

A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora

Abstract

Cite

Register to see more suggestions