An endogeneous corpus-based method for structural noun phrase disambiguation

20Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn't need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project, which aims at building a tool for knowledge extraction from large technical text corpora, requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER works and show what kind of disambiguation it has to perform when parsing "maximal-length" noun phrases. We introduce a method of disambiguation which relies on a very simple idea: whenever LEXTER has to choose among several competing noun sub-groups in order to disambiguate a maximal-length noun phrase, it checks each of these sub-groups if it occurs anywhere else in the corpus in a non-ambiguous situation, and then it makes a choice. The half-a-million words corpus analysis resulted in an efficient strategy of disambiguation. The average rates are: 27 % no disambiguation 70 % correct disambiguation 3 % wrong disambiguation

Cite

CITATION STYLE

APA

Bourigault, D. (1993). An endogeneous corpus-based method for structural noun phrase disambiguation. In 6th Conference of the European Chapter of the Association for Computational Linguistics, EACL 1993 - Proceedings (pp. 81–86). Association for Computational Linguistics (ACL). https://doi.org/10.3115/976744.976755

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free