Extracting named entities. a statistical approach

  • Silva J
  • Kozareva Z
  • Noncheva V
  • et al.
N/ACitations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language in- dependent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain dif- ferent clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.

Cite

CITATION STYLE

APA

Silva, J., Kozareva, Z., Noncheva, V., & Lopes, G. (2004). Extracting named entities. a statistical approach. Proceedings of the XIme Confrence Sur Le Traitement Des Langues Naturelles—TALN 19–22 Avril Fez Marroco, 11(1), 347–351. Retrieved from http://www.afcp-parole.org/doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/taln2004-Fez/Silva-Kozareva-Lopes.pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free